Alexandr A. Borovkov 


Probability Theory 


Q) Springer 


Universitext 


Universitext 


Series Editors: 


Sheldon Axler 
San Francisco State University, San Francisco, CA, USA 


Vincenzo Capasso 
Universita degli Studi di Milano, Milan, Italy 


Carles Casacuberta 
Universitat de Barcelona, Barcelona, Spain 


Angus MacIntyre 
Queen Mary, University of London, London, UK 


Kenneth Ribet 
University of California, Berkeley, Berkeley, CA, USA 


Claude Sabbah 
CNRS, Ecole Polytechnique, Palaiseau, France 


Endre Siili 
University of Oxford, Oxford, UK 


Wojbor A. Woyczynski 
Case Western Reserve University, Cleveland, OH, USA 


Universitext is a series of textbooks that presents material from a wide variety 
of mathematical disciplines at master’s level and beyond. The books, often well 
class-tested by their author, may have an informal, personal, even experimental 
approach to their subject matter. Some of the most successful and established 
books in the series have evolved through several editions, always following the 
evolution of teaching curricula, into very polished texts. 


Thus as research topics trickle down into graduate-level teaching, first textbooks 
written for new, cutting-edge courses may make their way into Universitext. 


For further volumes: 
www.springer.com/series/223 


Alexandr A. Borovkov 


Probability Theory 


Edited by K.A. Borovkov 
Translated by O.B. Borovkova and P.S. Ruzankin 


D) Springer 


Alexandr A. Borovkov 

Sobolev Institute of Mathematics and 
Novosibirsk State University 
Novosibirsk, Russia 


Translation from the Sth edn. of the Russian language edition: 
‘Teoriya Veroyatnostei’ by Alexandr A. Borovkov 

© Knizhnyi dom Librokom 2009 

All Rights Reserved. 


1st and 2nd edn. © Nauka 1976 and 1986 
3rd edn. © Editorial URSS and Sobolev Institute of Mathematics 1999 
4th edn. © Editorial URSS 2003 


ISSN 0172-5939 ISSN 2191-6675 (electronic) 
Universitext 
ISBN 978-1-4471-5200-2 ISBN 978-1-4471-5201-9 (eBook) 


DOI 10.1007/978-1-4471-5201-9 
Springer London Heidelberg New York Dordrecht 


Library of Congress Control Number: 2013941877 
Mathematics Subject Classification: 60-XX, 60-01 


© Springer-Verlag London 2013 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of 
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, 
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information 
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology 
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection 
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered 
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of 
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the 
Publisher’s location, in its current version, and permission for use must always be obtained from Springer. 
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations 
are liable to prosecution under the respective Copyright Law. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

While the advice and information in this book are believed to be true and accurate at the date of pub- 
lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any 
errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect 
to the material contained herein. 


Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


Foreword 


The present edition of the book differs substantially from the previous one. Over the 
period of time since the publication of the previous edition the author has accumu- 
lated quite a lot of ideas concerning possible improvements to some chapters of the 
book. In addition, some new opportunities were found for an accessible exposition 
of new topics that had not appeared in textbooks before but which are of certain 
interest for applications and reflect current trends in the development of modern 
probability theory. All this led to the need for one more revision of the book. As 
a result, many methodological changes were made and a lot of new material was 
added, which makes the book more logically coherent and complete. We will list 
here only the main changes in the order of their appearance in the text. 


e Section 4.4 “Expectations of Sums of a Random Number of Random Variables” 
was significantly revised. New sufficient conditions for Wald’s identity were added. 
An example is given showing that, when summands are non-identically distributed, 
Wald’s identity can fail to hold even in the case when its right-hand side is well- 
defined. Later on, Theorem 11.3.2 shows that, for identically distributed summands, 
Wald’s identity is always valid whenever its right-hand side is well-defined. 


e In Sect. 6.1 a criterion of uniform integrability of random variables is con- 
structed, which simplifies the use of this notion. For example, the criterion directly 
implies uniform integrability of weighted sums of uniformly integrable random vari- 
ables. 


e Section 7.2, which is devoted to inversion formulas, was substantially expanded 
and now includes assertions useful for proving integro-local theorems in Sect. 8.7. 


e In Chap. 8, integro-local limit theorems for sums of identically distributed ran- 
dom variables were added (Sects. 8.7 and 8.8). These theorems, being substantially 
more precise assertions than the integral limit theorems, do not require additional 
conditions and play an important role in investigating large deviation probabilities 
in Chap. 9. 


vi Foreword 


e A new chapter was written on probabilities of large deviations of sums of ran- 
dom variables (Chap. 9). The chapter provides a systematic and rather complete 
exposition of the large deviation theory both in the case where the Cramér condition 
(rapid decay of distributions at infinity) is satisfied and where it is not. Both integral 
and integro-local theorems are obtained. The large deviation principle is established. 


e Assertions concerning the case of non-identically distributed random variables 
were added in Chap. 10 on “Renewal Processes”. Among them are renewal theo- 
rems as well as the law of large numbers and the central limit theorem for renewal 
processes. A new section was written to present the theory of generalised renewal 
processes. 


e An extension of the Kolmogorov strong law of large numbers to the case 
of non-identically distributed random variables having the first moment only was 
added to Chap. 11. A new subsection on the “Strong law of large numbers for gen- 
eralised renewal processes” was written. 


e Chapter 12 on “Random walks and factorisation identities” was substantially 
revised. A number of new sections were added: on finding factorisation components 
in explicit form, on the asymptotic properties of the distribution of the suprema of 
cumulated sums and generalised renewal processes, and on the distribution of the 
first passage time. 


e In Chap. 13, devoted to Markov chains, a section on “The law of large numbers 
and central limit theorem for sums of random variables defined on a Markov chain” 
was added. 


e Three new appendices (6, 7 and 8) were written. They present important aux- 
iliary material on the following topics: “The basic properties of regularly varying 
functions and subexponential distributions”, “Proofs of theorems on convergence to 
stable laws”, and “Upper and lower bounds for the distributions of sums and maxima 
of sums of independent random variables”. 


As has already been noted, these are just the most significant changes; there are 
also many others. A lot of typos and other inaccuracies were fixed. The process of 
creating new typos and misprints in the course of one’s work on a book is random 
and can be well described mathematically by the Poisson process (for the defini- 
tion of Poisson processes, see Chaps 10 and 19). An important characteristic of the 
quality of a book is the intensity of this process. Unfortunately, I am afraid that in 
the two previous editions (1999 and 2003) this intensity perhaps exceeded a certain 
acceptable level. Not renouncing his own responsibility, the author still admits that 
this may be due, to some extent, to the fact that the publication of these editions took 
place at the time of a certain decline of the publishing industry in Russia related to 
the general state of the economy at that time (in the 1972, 1976 and 1986 editions 
there were much fewer such defects). 


Foreword Vii 


Before starting to work on the new edition, I asked my colleagues from our lab- 
oratory at the Sobolev Institute of Mathematics and from the Chair of Probability 
Theory and Mathematical Statistics at Novosibirsk State University to prepare lists 
of any typos and other inaccuracies they had spotted in the book, as well as sug- 
gested improvements of exposition. I am very grateful to everyone who provided 
me with such information. I would like to express special thanks to LS. Borisov, 
V.I. Lotov, A.A. Mogul’sky and S.G. Foss, who also offered a number of method- 
ological improvements. 

I am also deeply grateful to T.V. Belyaeva for her invaluable assistance in type- 
setting the book with its numerous changes. Without that help, the work on the new 
edition would have been much more difficult. 


A.A. Borovkov 


Foreword to the Third and Fourth Editions 


This book has been written on the basis of the Russian version (1986) published 
by “Nauka” Publishers in Moscow. A number of sections have been substantially 
revised and several new chapters have been introduced. The author has striven to 
provide a complete and logical exposition and simpler and more illustrative proofs. 
The 1986 text was preceded by two earlier editions (1972 and 1976). The first one 
appeared as an extended version of lecture notes of the course the author taught 
at the Department of Mechanics and Mathematics of Novosibirsk State University. 
Each new edition responded to comments by the readers and was completed with 
new sections which made the exposition more unified and complete. 

The readers are assumed to be familiar with a traditional calculus course. They 
would also benefit from knowing elements of measure theory and, in particular, 
the notion of integral with respect to a measure on an arbitrary space and its basic 
properties. However, provided they are prepared to use a less general version of 
some of the assertions, this lack of additional knowledge will not hinder the reader 
from successfully mastering the material. It is also possible for the reader to avoid 
such complications completely by reading the respective Appendices (located at the 
end of the book) which contain all the necessary results. 

The first ten chapters of the book are devoted to the basics of probability theory 
(including the main limit theorems for cumulative sums of random variables), and it 
is best to read them in succession. The remaining chapters deal with more specific 
parts of the theory of probability and could be divided into two blocks: random 
processes in discrete time (or random sequences, Chaps. 12 and 14—16) and random 
processes in continuous time (Chaps. 17-21). 

There are also chapters which remain outside the mainstream of the text as indi- 
cated above. These include Chap. 11 “Factorisation Identities”. The chapter not only 
contains a series of very useful probabilistic results, but also displays interesting re- 
lationships between problems on random walks in the presence of boundaries and 
boundary problems of complex analysis. Chapter 13 “Information and Entropy” and 
Chap. 19 “Functional Limit Theorems” also deviate from the mainstream. The for- 
mer deals with problems closely related to probability theory but very rarely treated 
in texts on the discipline. The latter presents limit theorems for the convergence 


1x 


x Foreword to the Third and Fourth Editions 


of processes generated by cumulative sums of random variables to the Wiener and 
Poisson processes; as a consequence, the law of the iterated logarithm is established 
in that chapter. 

The book has incorporated a number of methodological improvements. Some 
parts of it are devoted to subjects to be covered in a textbook for the first time (for 
example, Chap. 16 on stochastic recursive sequences playing an important role in 
applications). 

The book can serve as a basis for third year courses for students with a rea- 
sonable mathematical background, and also for postgraduates. A one-semester (or 
two-trimester) course on probability theory might consist (there could be many vari- 
ants) of the following parts: Chaps. 1-2, Sects. 3.1-3.4, 4.1-4.6 (partially), 5.2 and 
5.4 (partially), 6.1-6.3 (partially), 7.1, 7.2, 7.4-7.6, 8.1-8.2 and 8.4 (partially), 10.1, 
10.3, and the main results of Chap. 12. 

For a more detailed exposition of some aspects of Probability Theory and the 
Theory of Random Processes, see for example [2, 10, 12-14, 26, 31]. 

While working on the different versions of the book, I received advice and 
help from many of my colleagues and friends. I am grateful to Yu.V. Prokhorov, 
V.V. Petrov and B.A. Rogozin for their numerous useful comments which helped 
to improve the first variant of the book. I am deeply indebted to A.N. Kolmogorov 
whose remarks and valuable recommendations, especially of methodological char- 
acter, contributed to improvements in the second version of the book. In regard to 
the second and third versions, I am again thankful to V.V Petrov who gave me his 
comments, and to P. Franken, with whom I[ had a lot of useful discussions while the 
book was translated into German. 

In conclusion I want to express my sincere gratitude to V.V. Yurinskii, A.I. Sakha- 
nenko, K.A. Borovkoy, and other colleagues of mine who also gave me their com- 
ments on the manuscript. I would also like to express my gratitude to all those who 
contributed, in one way or another, to the preparation and improvement of the book. 


A.A. Borovkov 


For the Reader’s Attention 


The numeration of formulas, lemmas, theorems and corollaries consists of three 
numbers, of which the first two are the numbers of the current chapter and section. 
For instance, Theorem 4.3.1 means Theorem | from Sect. 3 of Chap. 4. Section 6.2 
means Sect. 2 of Chap. 6. 

The sections marked with an asterisk may be omitted in the first reading. 


The symbol 


at the end of a paragraph denotes the end of a proof or an important 


argument, when it should be pointed out that the argument has ended. 

The symbol :=, systematically used in the book, means that the left-hand side is 
defined to be given by the right-hand side. The relation =: has the opposite meaning: 
the right-hand side is defined by the left-hand side. 

The reader may find it useful to refer to the Index of Basic Notation and Subject 
index, which can be found at the end of this book. 


xi 


Introduction 


1. It is customary to set the origins of Probability Theory at the 17th century and 
relate them to combinatorial problems of games of chance. The latter can hardly be 
considered a serious occupation. However, it is games of chance that led to prob- 
lems which could not be stated and solved within the framework of the then existing 
mathematical models, and thereby stimulated the introduction of new concepts, ap- 
proaches and ideas. These new elements can already be encountered in writings by 
P. Fermat, D. Pascal, C. Huygens and, in a more developed form and somewhat 
later, in the works of J. Bernoulli, P.-S. Laplace, C.F. Gauss and others. The above- 
mentioned names undoubtedly decorate the genealogy of Probability Theory which, 
as we saw, is also related to some extent to the vices of society. Incidentally, as it 
soon became clear, it is precisely this last circumstance that can make Probability 
Theory more attractive to the reader. 

The first text on Probability Theory was Huygens’ treatise De Ratiociniis in Ludo 
Alea (“On Ratiocination in Dice Games”, 1657). A bit later in 1663 the book Liber 
de Ludo Aleae (“Book on Games of Chance’) by G. Cardano was published (in 
fact it was written earlier, in the mid 16th century). The subject of these treatises 
was the same as in the writings of Fermat and Pascal: dice and card games (prob- 
lems within the framework of Sect. 1.2 of the present book). As if Huygens foresaw 
future events, he wrote that if the reader studied the subject closely, he would no- 
tice that one was not dealing just with a game here, but rather that the foundations 
of a very interesting and deep theory were being laid. Huygens’ treatise, which is 
also known as the first text introducing the concept of mathematical expectation, 
was later included by J. Bernoulli in his famous book Ars Conjectandi (“The Art 
of Conjecturing”; published posthumously in 1713). To this book is related the no- 
tion of the so-called Bernoulli scheme (see Sect. 1.3), for which Bernoulli gave a 
cumbersome (cf. our Sect. 5.1) but mathematically faultless proof of the first limit 
theorem of Probability Theory, the Law of Large Numbers. 

By the end of the 19th and the beginning of the 20th centuries, the natural sci- 
ences led to the formulation of more serious problems which resulted in the develop- 
ment of a large branch of mathematics that is nowadays called Probability Theory. 
This subject is still going through a stage of intensive development. To a large extent, 


xiii 


xiv Introduction 


Probability Theory owes its elegance, modern form and a multitude of achievements 
to the remarkable Russian mathematicians P.L. Chebyshev, A.A. Markov, A.N. Kol- 
mogorov and others. 

The fact that increasing our knowledge about nature leads to further demand for 
Probability Theory appears, at first glance, paradoxical. Indeed, as the reader might 
already know, the main object of the theory is randomness, or uncertainty, which is 
due, as a rule, to a lack of knowledge. This is certainly so in the classical example 
of coin tossing, where one cannot take into account all the factors influencing the 
eventual position of the tossed coin when it lands. 

However, this is only an apparent paradox. In fact, there are almost no exact de- 
terministic quantitative laws in nature. Thus, for example, the classical law relating 
the pressure and temperature in a volume of gas is actually a result of a probabilistic 
nature that relates the number of collisions of particles with the vessel walls to their 
velocities. The fact is, at typical temperatures and pressures, the number of particles 
is so large and their individual contributions are so small that, using conventional 
instruments, one simply cannot register the random deviations from the relationship 
which actually take place. This is not the case when one studies more sparse flows 
of particles—say, cosmic rays—although there is no qualitative difference between 
these two examples. 

We could move in a somewhat different direction and name here the uncertainty 
principle stating that one cannot simultaneously obtain exact measurements of any 
two conjugate observables (for example, the position and velocity of an object). 
Here randomness is not entailed by a lack of knowledge, but rather appears as a fun- 
damental phenomenon reflecting the nature of things. For instance, the lifetime of a 
radioactive nucleus is essentially random, and this randomness cannot be eliminated 
by increasing our knowledge. 

Thus, uncertainty was there at the very beginning of the cognition process, and 
it will always accompany us in our quest for knowledge. These are rather general 
comments, of course, but it appears that the answer to the question of when one 
should use the methods of Probability Theory and when one should not will always 
be determined by the relationship between the degree of precision we want to attain 
when studying a given phenomenon and what we know about the nature of the latter. 

2. In almost all areas of human activity there are situations where some exper- 
iments or observations can be repeated a large number of times under the same 
conditions. Probability Theory deals with those experiments of which the result (ex- 
pressed in one way or another) may vary from trial to trial. The events that refer to 
the experiment’s result and which may or may not occur are usually called random 
events. 

For example, suppose we are tossing a coin. The experiment has only two out- 
comes: either heads or tails show up, and before the experiment has been carried 
out, it is impossible to say which one will occur. As we have already noted, the rea- 
son for this is that we cannot take into account all the factors influencing the final 
position of the coin. A similar situation will prevail if you buy a ticket for each lot- 
tery draw and try to predict whether it will win or not, or, observing the operation of 
a complex machine, you try to determine in advance if it will have failed before or 


Introduction XV 


Fig. 1 The plot of the Mn 
relative frequencies np /n n 
corresponding to the outcome 
sequence htthtthhhthht in 

the coin tossing experiment 


1/2 


a ee ee ee ee ee ee oe Se es 1s 
Ol 12 3 4 5 6 7 8 9 10 11 12 13°” 


after a given time. In such situations, it is very hard to find any laws when consid- 
ering the results of individual experiments. Therefore there is little justification for 
constructing any theory here. 

However, if one turns to a Jong sequence of repetitions of such an experiment, 
an interesting phenomenon becomes apparent. While individual results of the ex- 
periments display a highly “irregular” behaviour, the average results demonstrate 
stability. Consider, say, a long series of repetitions of our coin tossing experiment 
and denote by n; the number of heads in the first n trials. Plot the ratio ny,/n ver- 
sus the number n of conducted experiments (see Fig. 1; the plot corresponds to the 
outcome sequence Atthtthhhthh, where h stands for heads and t for tails, respec- 
tively). 

We will then see that, as n increases, the polygon connecting the consecutive 
points (n,n,/n) very quickly approaches the straight line n;,/n = 1/2. To verify 
this observation, G.L. Leclerc, comte de Buffon,! tossed a coin 4040 times. The 
number of heads was 2048, so that the relative frequency ny /n of heads was 0.5069. 
K. Pearson tossed a coin 24,000 times and got 12,012 heads, so that nz /n = 0.5005. 

It turns out that this phenomenon is universal: the relative frequency of a certain 
outcome in a series of repetitions of an experiment under the same conditions tends 
towards a certain number p € [0,1] as the number of repetitions grows. It is an 
objective law of nature which forms the foundation of Probability Theory. 

It would be natural to define the probability of an experiment outcome to be just 
the number p towards which the relative frequency of the outcome tends. How- 
ever, such a definition of probability (usually related to the name of R. von Mises) 
has proven to be inconvenient. First of all, in reality, each time we will be dealing 
not with an infinite sequence of frequencies, but rather with finitely many elements 
thereof. Obtaining the entire sequence is unfeasible. Hence the frequency (let it 
again be n;,/n) of the occurrence of a certain outcome will, as a rule, be different 
for each new series of repetitions of the same experiment. 

This fact led to intense discussions and a lot of disagreement regarding how one 
should define the concept of probability. Fortunately, there was a class of phenomena 
that possessed certain “symmetry” (in gambling, coin tossing etc.) for which one 
could compute in advance, prior to the experiment, the expected numerical values 


'The data is borrowed from [15]. 


Xvi Introduction 


of the probabilities. Take, for instance, a cube made of a sufficiently homogeneous 
material. There are no reasons for the cube to fall on any of its faces more often 
than on some other face. It is therefore natural to expect that, when rolling a die a 
large number of times, the frequency of each of its faces will be close to 1/6. Based 
on these considerations, Laplace believed that the concept of equiprobability is the 
fundamental one for Probability Theory. The probability of an event would then be 
defined as the ratio of the number of “favourable” outcomes to the total number of 
possible outcomes. Thus, the probability of getting an odd number of points (e.g. 1, 
3 or 5) when rolling a die once was declared to be 3/6 (i.e. the number of faces with 
an odd number of points was divided by the total number of all faces). If the die were 
rolled ten times, then one would have 6!° in the denominator, as this number gives 
the total number of equally likely outcomes and calculating probabilities reduces to 
counting the number of “favourable outcomes” (the ones resulting in the occurrence 
of a given event). 

The development of the mathematical theory of probabilities began from the in- 
stance when one started defining probability as the ratio of the number of favourable 
outcomes to the total number of equally likely outcomes, and this approach is nowa- 
days called “classical” (for more details, see Chap. 1). 

Later on, at the beginning of the 20th century, this approach was severely crit- 
icised for being too restrictive. The initiator of the critique was R. von Mises. As 
we have already noted, his conception was based on postulating stability of the fre- 
quencies of events in a long series of experiments. That was a confusion of physical 
and mathematical concepts. No passage to the limit can serve as justification for 
introducing the notion of “probability”. If, for instance, the values n;,/n were to 
converge to the limiting value 1/2 in Fig. | too slowly, that would mean that no- 
body would be able to find the value of that limit in the general (non-classical) case. 
So the approach is clearly vulnerable: it would mean that Probability Theory would 
be applicable only to those situations where frequencies have a limit. But why fre- 
quencies would have a limit remained unexplained and was not even discussed. 

In this relation, R. von Mises’ conception has been in turn criticised by many 
mathematicians, including A.Ya. Khinchin, S.N. Bernstein, ALN. Kolmogorov and 
others. Somewhat later, another approach was suggested that proved to be fruitful 
for the development of the mathematical theory of probabilities. Its general features 
were outlined by S.N. Bernstein in 1908. In 1933 a rather short book “Foundations 
of Probability Theory” by A.N. Kolmogorov appeared that contained a complete 
and clear exposition of the axioms of Probability Theory. The general construction 
of the concept of probability based on Kolmogorov’s axiomatics removed all the 
obstacles for the development of the theory and is nowadays universally accepted. 

The creation of an axiomatic Probability Theory provided a solution to the sixth 
Hilbert problem (which concerned, in particular, Probability Theory) that had been 
formulated by D. Hilbert at the Second International Congress of Mathematicians 
in Paris in 1900. The problem was on the axiomatic construction of a number of 
physical sciences, Probability Theory being classified as such by Hilbert at that 
time. 

An axiomatic foundation separates the mathematical aspect from the physical: 
one no longer needs to explain how and where the concept of probability comes 


Introduction XVIi 


from. The concept simply becomes a primitive one, its properties being described 
by axioms (which are essentially the axioms of Measure Theory). However, the 
problem of how the probability thus introduced is related (and can be applied) to 
the real world remains open. But this problem is mostly removed by the remarkable 
fact that, under the axiomatic construction, the desired fundamental property that the 
frequencies of the occurrence of an event converge to the probability of the event 
does take place and is a precise mathematical result. (For more details, see Chaps. 2 
and 5.)? 

We will begin by defining probability in a somewhat simplified situation, in the 
so-called discrete case. 


?Much later, in the 1960s A.N. Kolmogorov attempted to develop a fundamentally different ap- 
proach to the notions of probability and randomness. In that approach, the measure of randomness, 
say, of a sequence 0, 1,0, 0, 1, ... consisting of Os and 1s (or some other symbols) is the complex- 
ity of the algorithm describing this sequence. The new approach stimulated the development of a 
number of directions in contemporary mathematics, but, mostly due to its complexity, has not yet 
become widely accepted. 


Contents 


1___ Discrete Spaces of Elementary Events ................. 
1.1 Probability Space ..... 2.0... .....02..0...000.0. 
1.2 The ClassicalScheme ...................000.0. 
1.3. The BernoulliScheme...................2.00.0. 
1.4 The Probability of the Union of Events. Examples ......... 


2  AnArbitrary Space of Elementary Events ............... 
2.1 The Axioms of Probability Theory. A Probability Space ...... 
2.2 Properties of Probability... ...............-..004. 
2.3. Conditional Probability. Independence of Events and Trials ... . 
2.4 The Total Probability Formula. The Bayes Formula ........ 


3 Random Variables and Distribution Functions. ............ 
3.1 Definitions and Examples ...................04. 
3.2 Properties of Distribution Functions. Examples. .......... 

3.2.1 The Basic Properties of Distribution Functions ....... 
3.2.2 The Most Common Distributions .............. 
3.2.3. The Three Distribution Types ................ 
3.2.4 Distributions of Functions of Random Variables ...... 
3.3. Multivariate Random Variables .................2. 
3.4 Independence of Random Variables and Classes of Events .... . 
3.4.1 Independence of Random Vectors. ............. 
3.4.2 Independence of Classes of Events ............. 
3.4.3. Relations Between the Introduced Notions ......... 
3.5 On Infinite Sequences of Random Variables ............ 
3:6. ‘Integrals 2 25 3 ach dor ds Ae a ee eee EE Oe po 
3.6.1 Integral with Respect to Measure .............. 
3.6.2 The Stieltjes Integral... 2... ee, 
3.6.3 Integrals of Multivariate Random Variables. 
The Distribution of the Sum of Independent 
Random Variables .................0000. 


XX 


Contents 


Numerical Characteristics of Random Variables ........... 65 
4.) Expectation 2.6... 566s bee ee ee eee YS 65 
4.2 Conditional Distribution Functions and Conditional 
Expectations. 6. 2.300 eae eck ere Geo BR oe MO Pw 70 
4.3 Expectations of Functions of Independent Random Variables ... 74 
4.4 Expectations of Sums of a Random Number of Random 
Variables? es oc dak chee aS es ab ek & a Bea a a fe) 
A Variance... setae oe dae Re eRe ee wae EP eee 83 
4.6 The Correlation Coefficient and Other Numerical 
Characteristics... ee 85 
4.7 Inequalities... 2.2... 0... ee ee 87 
4.7.1 Moment Inequalities..................-.. 87 
4.7.2 Inequalities for Probabilities... .............. 89 
4.8 Extension of the Notion of Conditional Expectation ........ 91 
4.8.1 Definition of Conditional Expectation. ........... 91 
4.8.2 Properties of Conditional Expectations ........... 95 
4.9 Conditional Distributions ...................--. 99 
Sequences of Independent Trials with Two Outcomes ......... 107 
5.1 Laws ofLarge Numbers ..................2.00.. 107 
5.2. The Local Limit Theorem and Its Refinements ........... 109 
5.2.1. The Local Limit Theorem .................. 109 
5.2.2 Refinements of the Local Theorem ............. 111 
5.2.3. The Local Limit Theorem for the Polynomial 
Distributions... 2... 2. ee ee 114 
5.3. The de Moivre—Laplace Theorem and Its Refinements ....... 114 
5.4 The Poisson Theorem and Its Refinements ............. i 
5.4.1 Quantifying the Closeness of Poisson Distributions to 
Those of the Sums S, .. 2... 0. ee ee 117 
5.4.2 The Triangular Array Scheme. The Poisson Theorem ... 120 
5.5. Inequalities for Large Deviation Probabilities in the Bernoulli 
Scheme. ; 4.84405 004. due tee bane eee ne es ee 125 
On Convergence of Random Variables and Distributions ....... 129 
6.1 Convergence of Random Variables ................. 129 
6.1.1 Typesof Convergence .................00. 129 
6.1.2 The Continuity Theorem. .................. 134 
6.1.3. Uniform Integrability and Its Consequences ........ 134 
6.2 Convergence of Distributions .................0.0. 140 
6.3 Conditions for Weak Convergence ................. 147 
Characteristic Functions... 2... 2.2... 2... ee 153 
7.1 Definition and Properties of Characteristic Functions. ....... 153 
7.1.1 Properties of Characteristic Functions. ........... 154 
7.1.2 The Properties of Ch.F.s Related to the Structure of the 
Distributionof§ .. 2... 2... ee ees 159 


7.2. Inversion Formulas. ........0.... 0.00000 ee eee 161 


Contents XXi 


7.2.1. The Inversion Formula for Densities ............ 161 
7.2.2 The Inversion Formula for Distributions .......... 163 
7.2.3 The Inversion Formula in Lz. The Class of Functions that 
Are Both Densities and Ch.Fs................ 164 
7.3. The Continuity (Convergence) Theorem .............. 167 
7.4 The Application of Characteristic Functions in the Proof of the 
Poisson Theorem... .......0.-. 0. eee ee ee 169 
7.5 Characteristic Functions of Multivariate Distributions. 
The Multivariate Normal Distribution. ............... wal 
7.6 Other Applications of Characteristic Functions. The Properties of 
the Gamma Distribution ................2.--.0004 175 
7.6.1 Stability of the Distributions ®, ,2 and Kyo ........ 175 
7.6.2 The I-distribution and its properties ............ 176 
7.7 Generating Functions. Application to Branching Processes. 
A Problem on Extinction ..............02.-.00004 180 
7.7.1. Generating Functions ...............-.-000. 180 
7.7.2The Simplest Branching Processes ............. 180 
8 Sequences of Independent Random Variables. Limit Theorems .. . 185 
8.1 The Law of Large Numbers ....................0. 185 
8.2 The Central Limit Theorem for Identically Distributed Random 
Variables. 2 i i a3 ee eR ROGER eee BEE Ee ee 187 
8.3 The Law of Large Numbers for Arbitrary Independent Random 
Variables: ie. oa eo a oe SA eel eo wae ee 188 
8.4 The Central Limit Theorem for Sums of Arbitrary Independent 
Random Variables .........0.. 0.220200 eee eee 199 
8.5 Another Approach to Proving Limit Theorems. Estimating 
Approximation Rates .............2.... 00000. 209 
8.6 The Law of Large Numbers and the Central Limit Theorem in the 
Multivariate Case 2... 2k ea ee eR we we eS 214 
8.7 Integro-Local and Local Limit Theorems for Sums of Identically 
Distributed Random Variables with Finite Variance. ........ 216 
8.7.1 Integro-Local Theorems ..................., 216 
8.7.2. Local Theorems: 000-3 ce wa eek Sk ee BR we a 219 
8.7.3 The Proof of Theorem 8.7.1 in the General Case ...... 222 
8.7.4 Uniform Versions of Theorems 8.7.1—8.7.3 for Random 
Variables Depending ona Parameter ............ 225 
8.8 Convergence to Other Limiting Laws ................ a2F 
8.8.1 TheIntegral Theorem ...................., 230 
8.8.2 The Integro-Local and Local Theorems ........... 230 
8.8.3 AnExample ..................2....00., 236 


9 Large Deviation Probabilities for Sums of Independent Random 
Variables 4.20555 25.3 22 HE SEARS SRS S GEE EG ES YES 239 
9.1 Laplace’s and Cramér’s Transforms. The Rate Function ...... 240 


XXxil 


10 


Contents 


9.1.1 The Cramér Condition. Laplace’s and Cramér’s 


TransfOrmMs: 3.2.6 eo hoe Se EMA eae Ste 240 
9.1.2 The Large Deviation Rate Function. ............ 243 
9.2 A Relationship Between Large Deviation Probabilities for Sums 
of Random Variables and Those for Sums of Their Cramér 
Transforms. The Probabilistic Meaning of the Rate Function. . . . 250 
9.2.1 A Relationship Between Large Deviation Probabilities for 
Sums of Random Variables and Those for Sums of Their 
Cramér Transforms ..................00. 250 
9.2.2 The Probabilistic Meaning of the Rate Function ...... Fe | 
9.2.3. The Large Deviations Principle ............... 254 
9.3 Integro-Local, Integral and Local Theorems on Large Deviation 
Probabilities in the Cramér Range... ............0.2. 256 
9.3.1 Integro-Local and Integral Theorems ............ 256 
9.3.2 LocalTheorems ....................00. 261 
9.4 Integro-Local Theorems at the Boundary of the Cramér Range. . . 264 
9.4.1 Introduction ...............2....02.0008. 264 
9.4.2 The Probabilities of Large Deviations of S, in an 
o(n)-Vicinity of the Point a:n; the Case yw" (Az) <oo.. . 264 
9.4.3 The Class of Distributions ER. The Probability of Large 
Deviations of S, in an o(7)-Vicinity of the Point an for 
Distributions F from the Class ER in Case w’’(A4)=00 . . 266 
9.4.4 On the Large Deviation Probabilities in the Range a > a+ 
for Distributions from the ClassER...... 2.2.0... 269 
9.5 Integral and Integro-Local Theorems on Large Deviation 
Probabilities for Sums S, when the Cramér Condition Is not Met . 269 
9.5.1 Integral Theorems ...................... 270 
9.5.2 Integro-Local Theorems ................... 271 
9.6 Integro-Local Theorems on the Probabilities of Large Deviations 
of S,, Outside the Cramér Range (Under the Cramér Condition) . . 274 
Renewal Processes ............ 0.0.00 0 0000000008 2i7 
10.1 Renewal Processes. Renewal Functions ............... 277 
10.1.1 Introduction .... 2... 2.2.0.0... ....00000. 277 
10.1.2 The Integral Renewal Theorem for Non-identically 
Distributed Summands..................0.. 280 
10.2 The Key Renewal Theorem in the Arithmetic Case... ...... 285 
10.3 The Excess and Defect of a Random Walk. Their Limiting 
Distribution in the Arithmetic Case ..............0.. 290 
10.4 The Renewal Theorem and the Limiting Behaviour of the Excess 
and Defect in the Non-arithmetic Case ............... 293 
10.5 The Law of Large Numbers and the Central Limit Theorem for 
Renewal Processes... 2... 2. ee 298 
10.5.1 The Law of Large Numbers ................. 298 


10.5.2 The Central Limit Theorem ................. 299 


Contents 


11 


12 


10.5.3 A Theorem on the Finiteness of the Infimum of the 
CumulativeSums .........0... 0.000000 0% 
10.5.4 Stochastic Inequalities. The Law of Large Numbers and 
the Central Limit Theorem for the Maximum of Sums of 
Non-identically Distributed Random Variables Taking 
Values of Both Signs... 2... .......2.....00.. 
10.5.5 Extension of Theorems 10.5.1 and 10.5.2 to Random 
Variables Assuming Values of Both Signs. ......... 
10.5.6 The Local Limit Theorem .................. 
10.6 Generalised Renewal Processes ........... 00000004 
10.6.1 Definition and Some Properties ............... 
10.6.2 The Central Limit Theorem ................. 
10.6.3 The Integro-Local Theorem ................. 


Properties of the Trajectories of Random Walks. Zero-One Laws . . 
11.1 Zero-One Laws. Upper and Lower Functions ............ 
11.1.1 Zero-One Laws ..................0.0008. 
11.1.2 Lower and Upper Functions. ................ 
11.2 Convergence of Series of Independent Random Variables .... . 
11.3 The Strong Law of Large Numbers ................. 
11.4 The Strong Law of Large Numbers for Arbitrary Independent 
Variables: 2 gf bb ke ek Oe ew Be ee ee ee 
11.5 The Strong Law of Large Numbers for Generalised Renewal 
PROCESSES! «i ka eh Se ER ee BG REM Ree as 
11.5.1 The Strong Law of Large Numbers for Renewal Processes . 
11.5.2 The Strong Law of Large Numbers for Generalised 
Renewal Processes... .............02000.4 


Random Walks and Factorisation Identities ...........2... 
12.1 Factorisation Identities... 2... ......0...0.2000. 
12.1.1 Factorisation...............2....0.000. 
12.1.2 The Canonical Factorisation of the Function 
fA) SL —2QA) ee Sees Ba a ek ee Pela aod 
12.1.3. The Second Factorisation Identity. ............. 
12.2 Some Consequences of Theorems 12.1.1-12.1.3 ........2.. 
12.2.1 Direct Consequences .................00. 
12.2.2 A Generalisation of the Strong Law of Large Numbers. . . 
12.3 Pollaczek—Spitzer’s Identity. An Identity for S = sup;. 9 Sx : 
12.3.1 Pollaczek—Spitzer’s Identity ......... ee eer 
12.3.2 An Identity for S=supysg Se. ee 
12.4 The Distribution of S in Insurance Problems and Queueing 
MMEOLY: “soi, eS Berea eae eae ad: ea Soe Seg de ete teed Bk al 
12.4.1 Random Walks in Risk Theory ............... 
12.4.2 Queueing Systems ..................000. 
12.4.3. Stochastic Models in Continuous Time ........... 


XXIll 


Jal 


XXIV Contents 


12.5 Cases Where Factorisation Components Can Be Found in an 


Explicit Form. The Non-lattice Case... .........002. a51 
12.5.1 Preliminary Notes on the Uniqueness of Factorisation . . . 351 
12.5.2 Classes of Distributions on the Positive Half-Line with 

Rational Ch.Fs.. 2... ee ee, 354 


12.5.3. Explicit Canonical Factorisation of the Function v(A) in 
the Case when the Right Tail of the Distribution F Is an 


Exponential Polynomial... .............02. 255 
12.5.4 Explicit Factorisation of the Function v(A) when the Left 
Tail of the Distribution F Is an Exponential Polynomial . . 361 
12.5.5 Explicit Canonical Factorisation for the Function v(A) . . 362 
12.6 Explicit Form of Factorisation in the Arithmetic Case ....... 364 


12.6.1 Preliminary Remarks on the Uniqueness of Factorisation . 365 
12.6.2 The Classes of Distributions on the Positive Half-Line 

with Rational Generating Functions. ............ 366 
12.6.3 Explicit Canonical Factorisation of the Function v(z) in 

the Case when the Right Tail of the Distribution F Is an 

Exponential Polynomial... .............0.. 367 
12.6.4 Explicit Canonical Factorisation of the Function v(z) 

when the Left Tail of the Distribution F Is an Exponential 


Polynomiial:....02 ei bee kk Sa Re Se ee 370 
12.6.5 Explicit Factorisation of the Function e'(z)... 2. ee. 371 
12.7 Asymptotic Properties of the Distributions of y; andS ...... 372 
12.7.1 The Asymptotics of P(y+ > x | +4 < 00) and P(x? < —x) 
intheCaseKE<0O..................0040. 373 
12.7.2 The Asymptoticsof P0(S>x) 2... 2.0.2.2. 0000. 376 
12.7.3 The Distribution of the Maximal Values of Generalised 
Renewal Processes... ............-02200.4 380 
12.8 On the Distribution of the First Passage Time... ......... 381 
12.8.1 The Properties of the Distributions of the Times n+ .. 381 
12.8.2 The Distribution of the First Passage Time of an Aubiteany 
Level x by Arithmetic Skip-Free Walks... 2.2.2.0... 384 
13 Sequences of Dependent Trials. Markov Chains ............ 389 
13.1 Countable Markov Chains. Definitions and Examples. 
Classification of States... 2... ee ee 389 
13.1.1 Definition and Examples. .................. 389 
13.1.2 Classification of States... . 2... 2... 0.2.02. 000. 392 


13.2 Necessary and Sufficient Conditions for Recurrence of States. 
Types of States in an Irreducible Chain. The Structure of a 


Periodic Chain 2.4/4.4 is ee Wa ee Se we Qe ee We 395 
13.3 Theorems on Random Walks ona Lattice... ........... 398 
13.3.1 Symmetric Random Walks inR‘,k>2........... 400 
13.3.2 Arbitrary Symmetric Random Walks on the Line. ..... 401 


13.4 Limit Theorems for Countable Homogeneous Chains ....... 404 


Contents 


14 


15 


16 


17 


13.4.1 Ergodic Theorems ..................000. 
13.4.2 The Law of Large Numbers and the Central Limit 
Theorem for the Number of Visits toa Given State... .. 
13.5 The Behaviour of Transition Probabilities for Reducible Chains . . 
13.6 Markov Chains with Arbitrary State Spaces. Ergodicity of Chains 
with Positive Atoms ........... 00.002 ee ee eee 
13.6.1 Markov Chains with Arbitrary State Spaces... ...... 
13.6.2 Markov Chains Having a Positive Atom .......... 


13.7.1 The Ergodic Theorem .................0.. 
13.7.2 On Conditions(Dand(ID ...............0.0. 
13.8 Laws of Large Numbers and the Central Limit Theorem for Sums 
of Random Variables Defined on a Markov Chain ......... 
13.8.1 Random Variables Defined on a Markov Chain. ...... 
13.8.2 Laws of Large Numbers ................... 
13.8.3 The Central Limit Theorem ................. 


Information and Entropy .................+.-.20005 
14.1 The Definitions and Properties of Information and Entropy 
14.2 The Entropy of a Finite Markov Chain. A Theorem on the 
Asymptotic Behaviour of the Information Contained in a Long 
Message; Its Applications ..............02.-+0005 
14.2.1 The Entropy of a Sequence of Trials Forming a Stationary 
Markov Chain .................0.02.000. 
14.2.2 The Law of Large Numbers for the Amount of Information 
Contained ina Message ................00. 
14.2.3. The Asymptotic Behaviour of the Number of the Most 
Common Outcomes in a Sequence of Trials ........ 


Martingales 0.25 23 tc ses Se ak i s WAM AR Re we 
15.1 Definitions, Simplest Properties, and Examples. .......... 
15.2 The Martingale Property and Random Change of Time. Wald’s 
Tdentity 4 ccc oo ee ee Ree eee Ne bee ee eS 
15.3 ‘Imequalities .. 3. soe pe ae 4 eG Bade RAR SOR Gras 
15.3.1 Inequalities for Martingales ..............02. 
15.3.2 Inequalities for the Number of Crossings of a Strip... . . 
15.4 Convergence Theorems ..................200.0. 
15.5 Boundedness of the Moments of Stochastic Sequences... .... 


Stationary Sequences...................22 000005 
16.1 Basic Notions ............ 0.220.000.0000 000. 
16.2 Ergodicity (Metric Transitivity), Mixing and Weak Dependence . . 
16.3 The Ergodic Theorem ...................2.00.0. 


Stochastic Recursive Sequences ..................2-.. 
17.1 Basic Concepts... 2... 2.0.0.0... 020000000000. 
17.2 Ergodicity and Renovating Events. Boundedness Conditions . . . . 


XXVi 


18 


19 


20 


21 


22 


Contents 
17.2.1 Ergodicity of Stochastic Recursive Sequences ....... 508 
17.2.2 Boundedness of Random Sequences ............ 514 
17.3 Ergodicity Conditions Related to the Monotonicity of f ...... 516 
17.4 Ergodicity Conditions for Contracting in Mean Lipschitz 
Transformations ..... 2... 2.0... ee 518 
Continuous Time Random Processes .................. 527 
18.1 General Definitions ...................2.2000. oad 
18.2 Criteria of Regularity of Processes ............-.0... aoe 
Processes with Independent Increments ................ 539 
19.1 General Properties... 2... 2... ee, 529 
19.2 Wiener Processes. The Properties of Trajectories .......... 542 
19.3 The Laws of the Iterated Logarithm ................. 545 
19.4 The Poisson Process .... 2... 2.000. eee ee ee 549 
19.5 Description of the Class of Processes with Independent 
Increments 2 «ge a6 we Ke we See Se Pe es Sh Ried 
Functional Limit Theorems ....................--. 339 
20.1 Convergence to the Wiener Process ............-.-004. 559 
20.2 The Law of the Iterated Logarithm ................. 568 
20.3 Convergence to the Poisson Process... .........--004- 572 
20.3.1 Convergence of the Processes of Cumulative Sums... . . S72 
20.3.2 Convergence of Sums of Thinning Renewal Processes . . . 575 
Markov Processes... ............ 0000000000 eee 379 
21.1 Definitions and General Properties ................. 279 
21.1.1 Definition and Basic Properties ............... 79 
21.1.2 Transition Probability .................0.. 581 
21.2 Markov Processes with Countable State Spaces. Examples... . . 583 
21.2.1 Basic Properties of the Process ............... 583 
21¢2:2. BXamples.s: 9: dee 95% 2 aes BA ewe Re ee Re pe 8 589 
21.3 Branching Processes... .............- 0200000. 591 
21.4 Semi-Markov Processes ............ 0.02002 eee 593 
21.4.1 Semi-Markov Processes on the States ofa Chain. ..... 593 
21.4.2 The Ergodic Theorem .................0.. 594 
21.4.3 Semi-Markov Processes on Chain Transitions ....... O07 
21.5 Regenerative Processes .. 2... 2.2.2... .....2.000000. 600 
21.5.1 Regenerative Processes. The Ergodic Theorem ....... 600 
21.5.2 The Laws of Large Numbers and Central Limit Theorem 
for Integrals of Regenerative Processes ........... 601 
21.6 Diffusion Processes .. 2... 2... 0... 2 ee ee 603 
Processes with Finite Second Moments. Gaussian Processes .... . 611 
22.1 Processes with Finite Second Moments ............... 611 
22.2 Gaussian Processes .. 1... 2... 2 ee 614 


22.3 Prediction Problem ................. 00000008 616 


Contents 


Appendix 1 Extension of a Probability Measure.............. 
Appendix 2 Kolmogorov’s Theorem on Consistent Distributions 


Appendix 3 Elements of Measure Theory and Integration ........ 
3:1 Measure Spaces wos ba a Re ee AR eR we 
3.2 The Integral with Respect to a Probability Measure. ........ 

3.2.1 The Integrals of aSimple Function ............. 
3.2.2 The Integrals of an Arbitrary Function ........... 
3.2.3. Properties of Integrals .............2....0.. 
3.3 Further Properties of Integrals... 2... .....2.....00. 
3.3.1 Convergence Theorems ................... 
3.3.2 Connection to Integration with Respect to a Measure on 
the Real Line... 25 2 eee bk eRe ee 
3.3.3. Product Measures and Iterated Integrals... ........ 
3.4 The Integral with Respect to an Arbitrary Measure ......... 
3.5 The Lebesgue Decomposition Theorem and the Radon—Nikodym 
TREOTEM . wey ER Oa ee ee ee ee 
3.6 Weak Convergence and Convergence in Total Variation of 
Distributions in Arbitrary Spaces .. 2... 2... ee, 
3.6.1 Weak Convergence .................000. 
3.6.2 Convergence in Total Variation ............... 


Appendix 4 The Helly and Arzela—Ascoli Theorems ........... 
Appendix 5 The Proof of the Berry-Esseen Theorem........... 


Appendix 6 The Basic Properties of Regularly Varying Functions and 
Subexponential Distributions ...................... 
6.1 General Properties of Regularly Varying Functions. ........ 
6.2 The Basic Asymptotic Properties ...............0.. 
6.3. The Asymptotic Properties of the Transforms of R.V.F.s 
(Abel-Type Theorems)... .......... 0.020002 eee 
6.4 Subexponential Distributions and Their Properties ......... 


Appendix 7 The Proofs of Theorems on Convergence to Stable Laws . . 
7.1. ThelIntegral Limit Theorem ..................... 
7.2 The Integro-Local and Local Limit Theorems. ........... 


Appendix 8 Upper and Lower Bounds for the Distributions of the 
Sums and the Maxima of the Sums of Independent Random 
Variables: 2.6 ge debe ee es kaw Pe eee aes GS 
8.1 Upper Bounds Under the Cramér Condition ............ 
8.2 Upper Bounds when the Cramér Condition Is Not Met. ...... 
8.3 Lower Bounds ..................2.0.. 0.0000. 


Appendix 9 Renewal Theorems....................--. 


References: « 0234 3 be a AE a a ee Se 


XXVili Contents 


Index of Basic Notation .........0.0..0.0..0. 2.000000 0088 125 
Subject Index... 2... 2. ee es 727 


Chapter 1 
Discrete Spaces of Elementary Events 


Abstract Section 1.1 introduces the fundamental concept of probability space, 
along with some basic terminology and properties of probability when it is easy 
to do, i.e. in the simple case of random experiments with finitely or at most count- 
ably many outcomes. The classical scheme of finitely many equally likely outcomes 
is discussed in more detail in Sect. 1.2. Then the Bernoulli scheme is introduced and 
the properties of the binomial distribution are studied in Sect. 1.3. Sampling without 
replacement from a large population is considered, and convergence of the emerging 
hypergeometric distributions to the binomial one is formally proved. The inclusion- 
exclusion formula for the probabilities of unions of events is derived and illustrated 
by some applications in Sect. 1.4. 


1.1 Probability Space 


To mathematically describe experiments with random outcomes, we will first of all 
need the notion of the space of elementary events (or outcomes) corresponding to the 
experiment under consideration. We will denote by {2 any set such that each result 
of the experiment we are interested in can be uniquely specified by the elements 
of 22. 

In the simplest experiments we usually deal with finite spaces of elementary out- 
comes. In the coin tossing example we considered above, §2 consists of two ele- 
ments, “heads” and “tails”. In the die rolling experiment, the space £2 is also finite 
and consists of 6 elements. However, even for tossing a coin (or rolling a die) one 
can arrange such experiments for which finite spaces of elementary events will not 
suffice. For instance, consider the following experiment: a coin is tossed until heads 
shows for the first time, and then the experiment is stopped. If t designates tails in 
a toss and h heads, then an “elementary outcome” of the experiment can be repre- 
sented by a sequence (tt...th). There are infinitely many such sequences, and all 
of them are different, so there is no way to describe unambiguously all the outcomes 
of the experiment by elements of a finite space. 

Consider finite or countably infinite spaces of elementary events (2. These are 
the so-called discrete spaces. We will denote the elements of a space 2 by the letter 
w and call them elementary events (or elementary outcomes). 


A.A. Borovkov, Probability Theory, Universitext, 1 
DOI 10.1007/978-1-4471-5201-9_1, © Springer-Verlag London 2013 


2 1 Discrete Spaces of Elementary Events 


The notion of the space of elementary events itself is mathematically undefinable: 
it is a primitive one, like the notion of a point in geometry. The specific nature of 2 
will, as a rule, be of no interest to us. 

Any subset A C Q will be called an event (the event A occurs if any of the 
elementary outcomes w € A occurs). 

The union or sum of two events A and B is the event A U B (which may also be 
denoted by A + B) consisting of the elementary outcomes which belong to at least 
one of the events A and B. The product or intersection AB (which is often denoted 
by AN B as well) is the event consisting of all elementary events belonging to both 
A and B. The difference of the events A and B is the set A — B (also often denoted 
by A \ B) consisting of all elements of A not belonging to B. The set 2 is called the 
certain event. The empty set @ is called the impossible event. The set A= Q—A 
is called the complementary event of A. Two events A and B are mutually exclusive 
if AB=©. 

Let, for instance, our experiment consist in rolling a die twice. Here one can take 
the space of elementary events to be the set consisting of 36 elements (i, j,), where i 
and j run from | to 6 and denote the numbers of points that show up in the first and 
second roll respectively. The events A = {i + j < 3} and B = {j = 6} are mutually 
exclusive. The product of the events A and C = {j is even} is the event (1, 2). Note 
that if we were interested in the events related to the first roll only, we could consider 
a smaller space of elementary events consisting of just 6 elements i = 1,2,..., 6. 

One says that the probabilities of elementary events are given if a nonnegative 
real-valued function P is given on 2 such that }~ ocg P(@) = | (one also says that 
the function P specifies a probability distribution on 82). 

The probability of an event A is the number 


P(A):= > P(o). 


weA 


This definition is consistent, for the series on the right hand side is absolutely con- 
vergent. 

We note here that specific numerical values of the function P will also be of no 
interest to us: this is just an issue of the practical value of the model. For instance, 
it is clear that, in the case of a symmetric die, for the outcomes 1, 2,...,6 one 
should put P(1) = P(2) = --- = P(6) = 1/6; for a symmetric coin, one has to choose 
the values P(h) = P(t) = 1/2 and not any others. In the experiment of tossing a 
coin until heads shows for the first time, one should put P(A) = 1/2, P(th) = 1/ 2, 
P(tth) = 1/23,.... Since 2, 27” = 1, the function P given in this way on the 
outcomes of the form (t...th) will define a probability distribution on $2. For ex- 
ample, to calculate the probability that the experiment stops on an even step (that is, 
the probability of the event composed of the outcomes (th), (ttth),...), one should 
consider the sum of the corresponding probabilities which is equal to 


ioe) 
Q-2n = 


4 1 
Xs=-. 
3 3 


Ale 


1.1 Probability Space 3 


In the experiments mentioned in the Introduction, where one had to guess when 
a device will break down—before a given time (the event A) or after it, quantita- 
tive estimates of the probability P(A) can usually only be based on the results of the 
experiments themselves. The methods of estimating unknown probabilities from ob- 
servation results are studied in Mathematical Statistics, the subject-matter of which 
will be exemplified somewhat later by a problem from this chapter. 

Note further that by no means can one construct models with discrete spaces of 
elementary events for all experiments. For example, suppose that one is measuring 
the energy of particles whose possible values fill the interval [0, V], V > 0, but the 
set of points of this interval (that is, the set of elementary events) is continuous. 
Or suppose that the result of an experiment is a patient’s electrocardiogram. In this 
case, the result of the experiment is an element of some functional space. In such 
cases, more general schemes are needed. 

From the above definitions, making use of the absolute convergence of the series 
eoca P(@), one can easily derive the following properties of probability: 


(1) P(@) =0, P(Q) = 1. 

(2) P(A + B) = di ycaus P@) = Vaca PC) + Vvep PC) — Vocang P@) = 
P(A) + P(B) — P(AB). 

(3) P(A) =1— P(A). 


This entails, in particular, that, for disjoint (mutually exclusive) events A and B, 
P(A + B) =P(A) + P(B). 


This property of the additivity of probability continues to hold for an arbitrary 
number of disjoint events Aj, A2,...: if A; A; = 9 fori Fj, then 


(Us) =) /P(Ay). Git) 
k=1 k=1 


This follows from the equality 
n n 
r( U a) = SY P(Ax) 
k=1 k=1 


and the fact that Pus. 41; Ak) > 0 as n > ov. To prove the last relation, first 
enumerate the elementary events. Then we will be dealing with the sequence 
w,@2,...5 Jor = 2, P(U,., Ok) = Yixps, P(@r) > 0 as n > oo. Denote by 
nx the number of events A; such that mj € Aj = An,; nx = 0 if oA; = S for 
all j. If nx < N < o for all k, then the events A; with j > N are empty and 
the desired relation is obvious. If Ns := maxy<s nk —> OO as s — oo, then one has 
Ujsn Aj C Uses ox for n > Ns, and therefore 


P(U4i) <P(Uer) = Pe) +0 as § > 00. 


j>n k>s k>s 


4 1 Discrete Spaces of Elementary Events 


The required relation is proved. 
For arbitrary A and B, one has P(A + B) < P(A) + P(B). A similar inequality 
also holds for the sum of an arbitrary number of events: 


r( U a) = Y > P(Ax). 
k=1 k=1 


This follows from (1.1.1) and the representation of ) Ax as the union (J Ag By of 
disjoint events AgBx, where By = U pee Aj. It remains to note that P(A; Bx) < 
P(A;). 

Now we will consider several important special cases. 


1.2 The Classical Scheme 


Let (2 consist of n elements and all the outcomes be equally likely, that is P(w) = 
1/n for any w € Q2. In this case, the probability of any event A is defined by the 
formula 


1 
P(A) := —{number of elements of A}. 
n 


This is the so-called classical definition of probability (the term uniform discrete 
distribution is also used). 

Let a set {a1,d2,...,a,} be given, which we will call the general popula- 
tion. A sample of size k from the general population is an ordered sequence 
(dj,,Qj),..-,@j,). One can form this sequence as follows: the first element aj, is 
chosen from the whole population. The next element a;, we choose from the general 
population without the element a;,; the element aj, is chosen from the general pop- 
ulation without the elements aj, and a;,, and so on. Samples obtained in such a way 
are called samples without replacement. Clearly, one must have k < n in this case. 
The number of such samples of size k coincides with the number of arrangements 
of k elements from n: 


(n)p = n(n — 1)(n—2)---(n-—k+1). 


Indeed, according to the sampling process, in the first position we can have any 
element of the general population, in the second position any of the remaining 
(n — 1) elements, and so on. We could prove this more formally by induction on k. 

Assign to each of the samples without replacement the probability 1/(m),. Such 
a sample will be called random. This is clearly the classical scheme. 

Calculate the probability that a;, = a, and aj, = a2. Since the remaining k — 2 
positions can be occupied by any of the remaining n — 2 elements of the general 
population, the number of samples without replacement having elements a, and a2 


1.2 The Classical Scheme 5 


in the first two positions equals (n — 2),_2. Therefore the probability of that event 
is equal to 


(n—2)e-2 1 
(n)x n(n — 1) 
One can think of a sample without replacement as the result of sequential sampling 
from a collection of enumerated balls placed in an urn. Sampled balls are not re- 
turned back to the urn. 

However, one can form a sample in another way as well. One takes a ball out of 
the urn and memorises it. Then the ball is returned to the urn, and one again picks 
a ball from the urn; this ball is also memorised and put back to the urn, and so on. 
The sample obtained in this way is called a sample with replacement. At each step, 
one can pick any of the n balls. There are k such steps, so that the total number of 
such samples will be n*. If we assign the probability of 1/n* to each sample, this 
will also be a classical scheme situation. 

Calculate, for instance, the probability that, in a sample with replacement of size 
k <n, all the elements will be different. The number of samples of elements without 
repetitions is the same as the number of samples without replacement, i.e. (1)x. 
Therefore the desired probability is (n),/n*. 

We now return to sampling without replacement for the general population 
{d1, 42, ...,@,}. We will be interested in the number of samples of size k <n which 
differ from each other in their composition only. The number of samples without 
replacement of size k which have the same composition and are only distinguished 
by the order of their elements is k! Hence the number of samples of different com- 


position equals 
(1)k _ (n 
kL ky 


This is the number of combinations of k items chosen from a total of n for 0 < 
k <n.! If the initial sample is random, we again get the classical probability scheme, 
for the probability of each new sample is 


k! 1 


(ne () 

Let our urn contain n balls, of which n; are black and n — n; white. We sample k 
balls without replacement. What is the probability that there will be exactly k, black 
balls in the sample? The total number of samples which differ in the composition 
is, as was shown above, Gy. There are (it) ways to choose k; black balls from the 
totality of n, black balls. The remaining k — k; white balls can be chosen from the 
totality of n — n; white balls in (a) ways. Note that clearly any collection of 
black balls can be combined with any collection of white balls. Therefore the total 


Tn what follows, we put (3) =0 fork <Oandk>n. 


6 1 Discrete Spaces of Elementary Events 


number of samples of size k which differ in composition and contain exactly k, 


black balls is (at) (ea) Thus the desired probability is equal to 


ny n—-ny n 
roats= (2 (022)/0) 


The collection of numbers Py, n(0,k), PayinQ.k),.--, Pay n(k,k) forms the so- 
called hypergeometric distribution. From the derived formula it follows, in particu- 
lar, that, for any 0 <n, <n, 


SONG) -() 

bao ki] \k—-ky k} 

Example 1.2.1 Yn the 1980s, a version of a lottery called “Sportloto 6 out of 49” 
had became rather popular in Russia. A gambler chooses six from the totality of 
49 sports (designated just by numbers). The prize amount is determined by how 
many sports he guesses correctly from another group of six sports, to be drawn at 
random by a mechanical device in front of the public. What is the probability that 
the gambler correctly guesses all six sports? A similar question could be asked about 
five sports, and so on. 

It is not difficult to see that this is nothing else but a problem on the hypergeo- 
metric distribution where the gambler has labelled as “white” six items in a general 
population consisting of 49 items. Therefore the probability that, of the six items 
chosen at random, k, will turn out to be “white” (i.e. will coincide with those la- 
belled by the gambler) is equal to Pe49(k1, k), where the sample size k equals 6. 
For example, the probability of guessing all six sports correctly is 


49\~! = 
P6,49(6, 6) = 6 7.2 x 107°. 


In connection with the hypergeometric distribution, one could comment on the 
nature of problems in Probability Theory and Mathematical Statistics. Knowing the 
composition of the general population, we can use the hypergeometric distribution 
to find out what chances different compositions of the sample would have. This 
is a typical direct problem of probability theory. However, in the natural sciences 
one usually has to solve inverse problems: how to determine the nature of general 
populations from the composition of random samples. Generally speaking, such 
inverse problems form the subject matter of Mathematical Statistics. 


1.3 The Bernoulli Scheme 


Suppose one draws a sample with replacement of size r from a general population 
consisting of two elements {0, 1}. There are 2” such samples. Let p be a number in 


1.3. The Bernoulli Scheme tb 


the interval [0, 1]. Define a nonnegative function P on the set 2 of all samples in the 
following way: if a sample w contains exactly k ones, then P(w) = p*(1 — p)’~*. 
To verify that P is a probability, one has to prove the equality 


P(2)=1. 


It is easy to see that k ones can be arranged in r places in (;) different ways. There- 
fore there is the same number of samples containing exactly k ones. Now we can 
compute the probability of Q: 


P=)" (;) ota — py *=(p+(1—p))’ =1. 
k=0 


The second equality here is just the binomial formula. At the same time we have 
found that the probability P(k, r) that the sample contains exactly k ones is: 


P(k,r) = (;) ota Spy 


This is the so-called binomial distribution. It can be considered as the distribution 
of the number of “successes” in a series of r trials with two possible outcomes in 
each trial: 1 (“success”) and 0 (“failure”). Such a series of trials with probability 
P(w) defined as rae — py —k where k is the number of successes in «, is called 
the Bernoulli scheme. It turns out that the trials in the Bernoulli scheme have the 
independence property which will be discussed in the next chapter. 

It is not difficult to verify that the probability of having 1 at a fixed place in 
the sample (say, at position s) equals p. Indeed, having removed the item number s 
from the sample, we obtain a sample from the same population, but of size r — 1. We 
will find the desired probability if we multiply the probabilities of these truncated 
samples by p and sum over all “short” samples. Clearly, we will get p. This is why 
the number p in the Bernoulli scheme is often called the success probability. 

Arguing in the same way, we find that the probability of having | at k fixed 
positions in the sample equals p*. 

Now consider how the probabilities P(k,r) of various outcomes behave as k 
varies. Let us look at the ratio 

P(k,r) p r-k+l1 p (SH i) 


R(k, = = = 
= BEAT lap ck 1—-p\ k 


It clearly monotonically decreases as k increases, the value of the ratio being less 
than | for k/(r + 1) < p and greater than 1 for k/(r + 1) > p. This means that 
the probabilities P(k,r) first increase and then, for k > p(r + 1), decrease as k 
increases. 

The above enables one to estimate, using the quantities P(k, r), the probabilities 


k 
Oh. N=SOPG 


j=0 


8 1 Discrete Spaces of Elementary Events 


that the number of successes in the Bernoulli scheme does not exceed k. Namely, 
fork < pr+1)), 


1 1 
O(k,r) = Pek r(1 + Rn + REDREOLD a ~) 
Jha = pep 
R(k,r)—1 (r+l)p—k 


It is not difficult to see that this bound will be rather sharp if the numbers & and r 
are large and the ratio k/(pr) is not too close to 1. In that case the sum 


1 1 


i R(k,r) 7 R(k,r)R(k —1,r) are 


will be close to the sum of the geometric series 


SRA (kn) = ao. 
a, R(kr)—1 


and we will have the approximate equality 


(r+1—k)p 


O(k,r) © P(k,r) fiber 


(a0) 


For example, for r = 30, p = 0.7 and k = 16 one has rp = 21 and P(k,r) © 
0.023. Here the ratio (OF equals 15 x 0.7/5.7 1.84. Hence the right hand 
side of (1.3.1) estimating Q(k,r) is approximately equal to 0.023 x 1.84 ~ 0.042. 
The true value of Q(k, 7) for the given values of r, p and k is 0.040 (correct to three 
decimals). 

Formula (1.3.1) will be used in the example in Sect. 5.2. 

Now consider a general population composed of n items, of which n, are of 
the first type and n2 =n — nj of the second type. Draw from it a sample without 
replacement of size r. 


Theorem 1.3.1 Let n and n, tend to infinity in such a way that ni/n — p, where 
p is anumber from the interval [0, 1]. Then the following relation holds true for the 
hypergeometric distribution: 


Pay n@1,7) > P(ri,r). 


Proof Divide both the numerator and denominator in the formula for Py, .n(71,1r) 
(see Sect. 1.2) by n”. Putting rp =r — rr; and nz :=n — nq, we get 


ri(n—r)! n! ny! 


P, rorj= 
moO) ily =r) Pala —72) 


1.4 The Probability of the Union of Events. Examples 9 


ny (ny 1 ny 2 ny ry—l 
pues rf a Lee AG n ) 


~ rylro! nq—t)...q—) 


nof(n2 1 ng rm—-1 
x eae 
n\n n n n 


~ (;,)on — py? = P(r) 
ry] 


as n — oo. The theorem is proved. 


For sufficiently large n, Py; n(ri,r) is close to P(r1,r) by the above theorem. 
Therefore the Bernoulli scheme can be thought of as sampling without replacement 
from a very large general population consisting of items of two types, the proportion 
of items of the first type being p. 

In conclusion we will consider two problems. 

Imagine n bins in which we place at random r enumerated particles. Each particle 
can be placed in any of the 7 bins, so that the total number of different allocations of 
r particles to n bins will be n”. Allocation of particles to bins can be thought of as 
drawing a sample with replacement of size r from a general population of n items. 
We will assume that we are dealing with the classical scheme, where the probability 
of each outcome is 1/n’". 

(1) What is the probability that there are exactly r; particles in the k-th bin? 
The remaining r — r; particles which did not fall into bin k are allocated to the 
remaining n — | bins. There are (n — 1)’~"! different ways in which these r — r; 
particles can be placed into n — | bins. Of the totality of r particles, one can choose 
r — ry particles which did not fall into bin k in (cs) different ways. Therefore the 
desired probability is 


r (n—1)'"" r 1” 1\'" 
—————— = - 1-- ; 
(, 7) n” (, ‘Js ( *) 


This probability coincides with P(r;, 7) in the Bernoulli scheme with p = 1/n. 
(2) Now let us compute the probability that at least one bin will be empty. Denote 
this event by A. Let Ay mean that the k-th bin is empty, then 


n 
A=|JAr. 
k=1 
To find the probability of the event A, we will need a formula for the probability 


of a sum (union) of events. We cannot make use of the additivity of probability, for 
the events Ax are not disjoint in our case. 


1.4 The Probability of the Union of Events. Examples 


Let us return to an arbitrary discrete probability space. 


10 1 Discrete Spaces of Elementary Events 


Theorem 1.4.1 Let A, Ao,..., An be events. Then 
n n 
(Ua) = y > P(Ai) = Y > P(AiAj) 
i=1 i=l i<j 


+ 0 PUAiAjAg) — +++ + (HD) PAT An). 


i<j<k 
Proof One has to make use of induction and the property of probability that 
P(A + B)= P(A) + P(B) — P(AB) 
which we proved in Sect. 1.1. For n = 2 the assertion of the theorem is true. Suppose 
it is true for any n — 1 events A,,..., A,—1. Then, setting B = (Ss, Aj, we get 


(Ua) =P(B + Ay) = P(B) + P(An) — P(AnB). 
i=1 


Substituting here the known values 


n—1 


n-l 
rn =P(U a and Pant) =P( Lara), 
i=l i=1 


we obtain the assertion of the theorem. 


Now we will turn to the second problem about bins (see the end of Sect. 1.3) and 
find the probability of the event A that at least one bin is empty. We represented A 
in the form ear Ax, where Ax denotes the event that all the r particles miss the 


k-th bin. One has 
(n— 1)" 1\" 
P(A;) = ={1 , Kk<n. 
n’ 


n 


The event A; A; means that all r particles are allocated to n — 2 bins with labels 
differing from k and /, and therefore 


(n — 2)" a 
P(AgA}) = ——— = (1 . kl <n. 
n 


n 


Similarly, 


n—3° 3\" 
P(Ag AI Am) = —— =|1--]). &1m<n, 
n n 


and so on. The probability of the event A is equal by Theorem 1.4.1 to 


P(A) =n(1 2 ~) = (5)(: = =) i 


1.4 The Probability of the Union of Events. Examples 11 


_y (1-2), 
d| ) ; ; 


Discussion of this problem will be continued in Example 4.1.5. 

As an example of the use of Theorem 1.4.1 we consider one more problem having 
many varied applications. This is the so-called matching problem. 

Suppose n items are arranged in a certain order. They are rearranged at random 
(all n! permutations are equally likely). What is the probability that at least one 
element retains its position? 

There are n! different permutations. Let A; denote the event that the k-th item 
retains its position. This event is composed of (7 — 1)! outcomes, so its probability 
equals 


(n—1)! 


n! 


P(Ax) = 


The event A; A; means that the k-th and /-th items retain their positions; hence 


(~—(n—1))!_ 


n! n! 


— 2)! 
P(AGAD) = — 


+) P(A1--- Ag) = 


Now ()y_, Ax is precisely the event that at least one item retains its position. There- 
fore we can make use of Theorem 1.4.1 to obtain 


a) Ca 4 a (-1)""! 
r(Ua)=(1) n! (5) a +(%) 1 sep 


1 1 (-1)"7! 
a 


ee ee ee (—1)" 
— a ge 


The last expression in the parentheses is the first n + 1 terms of the expansion of 
e—! into a series. Therefore, as n > 00, 


n 
°(U a sitee |, 
k=1 


Chapter 2 
An Arbitrary Space of Elementary Events 


Abstract The chapter begins with the axiomatic construction of the probability 
space in the general case where the number of outcomes of an experiment is not 
necessarily countable. The concepts of algebra and sigma-algebra of sets are intro- 
duced and discussed in detail. Then the axioms of probability and, more generally, 
measure are presented and illustrated by several fundamental examples of measure 
spaces. The idea of extension of a measure is discussed, basing on the Carathéodory 
theorem (of which the proof is given in Appendix 1). Then the general elementary 
properties of probability are discussed in detail in Sect. 2.2. Conditional probability 
given an event is introduced along with the concept of independence in Sect. 2.3. 
The chapter concludes with Sect. 2.4 presenting the total probability formula and 
the Bayes formula, the former illustrated by an example leading to the introduction 
of the Poisson process. 


2.1 The Axioms of Probability Theory. A Probability Space 


So far we have been considering problems in which the set of outcomes had at most 
countably many elements. In such a case we defined the probability P(A) using the 
probabilities P(w) of elementary outcomes ow. It proved to be a function defined on 
all the subsets A of the space §2 of elementary events having the following proper- 
ties: 


(1) P(A) = 0. 
(2) P(82) =1. 
(3) For disjoint events Aj, A2,... 


P( U Aj) =) P(Aj). 


However, as we have already noted, one can easily imagine a problem in which 
the set of all outcomes is uncountable. For example, choosing a point at random 
from the segment [7, f2] (say, in an experiment involving measurement of tempera- 
ture) has a continuum of outcomes, for any point of the segment could be the result 
of the experiment. While in experiments with finite or countable sets of outcomes 
any collection of outcomes was an event, this is not the case in this example. We will 


A.A. Borovkov, Probability Theory, Universitext, 13 
DOI 10.1007/978-1-4471-5201-9_2, © Springer-Verlag London 2013 


14 2 An Arbitrary Space of Elementary Events 


encounter serious difficulties if we treat any subset of the segment as an event. Here 
one needs to select a special class of subsets which will be treated as events. 

Let the space of elementary events S2 be an arbitrary set, and A be a system of 
subsets of (2. 
Definition 2.1.1 A is called an algebra if the following conditions are met: 
Al. QEA. 
A2. If Ae Aand Be A, then 

AUBEA, ANBEA. 


A3. If AE A then AEA. 


It is not hard to see that in condition A2 it suffices to require that only one of the 
given relations holds. The second relation will be satisfied automatically since 


ANB=AUB. 


An algebra A is sometimes called a ring since there are two operations defined 
on A (addition and multiplication) which do not lead outside of A. An algebra A is 
a ring with identity, for 2 € A and AQ = 2A=A forany Ac A. 


Definition 2.1.2 A class of sets ¥ is called a sigma-algebra (o-algebra, or o-ring, 
or Borel field of events) if property A2 is satisfied for any sequences of sets: 


A2’. If {An} is a sequence of sets from §, then 


[oe (oe) 
LJ An es, (An es. 
n=1 n=1 


Here, as was the case for A2, it suffices to require that only one of the two rela- 
tions be satisfied. The second relation will follow from the equality 


(An =|Jan. 
n n 


Thus an algebra is a class of sets which is closed under a finite number of opera- 
tions of taking complements, unions and intersections; a o-algebra is a class of sets 
which is closed under a countable number of such operations. 

Given a set (2 and an algebra or o-algebra § of its subsets, one says that we are 
given a measurable space (22,8). 

For the segment [0, 1], all the sets consisting of a finite number of segments or 
intervals form an algebra, but not a o-algebra. 


2.1. The Axioms of Probability Theory. A Probability Space 15 


Consider all the o-algebras on [0, 1] containing all intervals from that segment 
(there is at least one such o-algebra, for the collection of all the subsets of a given 
set clearly forms a o-algebra). It is easy to see that the intersection of all such o- 
algebras (i.e. the collection of all the sets which belong simultaneously to all the o- 
algebras) is again a o-algebra. It is the smallest o-algebra containing all intervals 
and is called the Borel o-algebra. Roughly speaking, the Borel o-algebra could be 
thought of as the collection of sets obtained from intervals by taking countably many 
unions, intersections and complements. This is a rather rich class of sets which is 
certainly sufficient for any practical purposes. The elements of the Borel o-algebra 
are called Borel sets. Everything we have said in this paragraph equally applies to 
systems of subsets of the whole real line. 

Along with the intervals (a, b), the one-point sets {a} and sets of the form (a, b], 
[a, b] and [a, b) Gn which a and b can take infinite values) are also Borel sets. This 
assertion follows, for example, from the representations of the form 


{a}=(\(a-1/n,a+1/n), — (a,b]=(\(a,b+1/n). 


n=1 n=1 


Thus all countable sets and countable unions of intervals and segments are also 
Borel sets. 

For a given class B of subsets of 2, one can again consider the intersection of 
all o-algebras containing B and obtain in this way the smallest o-algebra contain- 
ing B. 


Definition 2.1.3 The smallest o-algebra containing B is called the o-algebra gen- 
erated by B and is denoted by o(B). 


In this terminology, the Borel o-algebra in the n-dimensional Euclidean space 
R” is the o-algebra generated by rectangles or balls. If £2 is countable, then the 
o-algebra generated by the elements w € 92 clearly coincides with the o-algebra of 
all subsets of §2. 

As an exercise, we suggest the reader to describe the algebra and the o-algebra 
of sets in £2 = [0,1] generated by: (a) the intervals (0, 1/3) and (1/3, 1); (b) the 
semi-open intervals (a, 1], 0 < a < 1; and (c) individual points. 

To formalise a probabilistic problem, one has to find an appropriate measurable 
space (S2, §) for the corresponding experiment. The symbol S2 denotes the set of 
elementary outcomes of the experiment, while the algebra or o-algebra § specifies a 
class of events. All the remaining subsets of (2 which are not elements of § are not 
events. Rather often it is convenient to define the class of events ‘§ as the o-algebra 
generated by a certain algebra A. 

Selecting a specific algebra or o-algebra § depends, on the one hand, on the 
nature of the problem in question and, on the other hand, on that of the set 22. As 
we will see, one cannot always define probability in such a way that it would make 
sense for any subset of 22. 


16 2 An Arbitrary Space of Elementary Events 


We have already noted in Chap. | that, in probability theory, one uses, along 
with the usual set theory terminology, a somewhat different terminology related to 
the fact that the subsets of 2 (belonging to §) are interpreted as events. The set 2 
itself is often called the certain event. By axioms A1 and A2, the empty set J also 
belongs to 3; it is called the impossible event. The event A is called the complement 
event or simply the complement of A. If AN B = ©, then the events A and B are 
called mutually exclusive or disjoint. 

Now it remains to introduce the notion of probability. Consider a space 2 and a 
system A of its subsets which forms an algebra of events. 


Definition 2.1.4 A probability on (§2, A) is a real-valued function defined on the 
sets from A and having the following properties: 


Pl. P(A) => 0 for any Ac A. 
P2. P(2) = 1. 
P3. Ifa sequence of events {A,,} is such that A; A; = @ fori # j and La An € A, 


then 
(Ua) => PAD. (2.1.1) 


n=l n=1 


These properties can be considered as an axiomatic definition of probability. 
An equivalent to axiom P3 is the requirement of additivity (2.1.1) for finite col- 
lections of events A; plus the following continuity axiom. 


P3’. Let {B,} be a sequence of events such that By41 C By and (\-~; Bn = BEA. 
Then P(B,,) > P(B) as n > oo. 


Proof of the equivalence Assume P3 is satisfied and let Bni1 C Bn, (1, B, = 
B &€ A. Then the sequence of the events B, Cy = BBs, k =1,2,..., consists 
of disjoint events and B, = B + Ue, Cx for any n. Now making use of property 
P3 we see that the series P(B,) = P(B) + Saat P(Cx) is convergent, which means 
that 

[o,@) 

P(By) = P(B) + ¥\ P(Cy) > P(B) 

k=n 

as n — oo. This is just the property P3’. 
Conversely, if A, is a sequence of disjoint events, then 


[o.@) n CO 
(Us) -»( Ua] +7( U a 
k=1 k=1 k=n+1 
and one has 


CO n n 
Pan) =m, P04) =, ( LA] 


2.1. The Axioms of Probability Theory. A Probability Space 17 


=a} 0,9} (0) 


The last equality follows from P3’. 


Definition 2.1.5 A triple (92,.A, P) is called a wide-sense probability space. If an 
algebra § is a o-algebra (§ = o (§)), then condition I ia An € § in axiom P3 (for 
a probability on (2, §)) will be automatically satisfied. 


Definition 2.1.6 A triple (2, §,P), where § is a o-algebra, is called a probability 
space. 


A probability P on (92, §) is also sometimes called a probability distribution on 
2 or just a distribution on Q (on (2, $)). 

Thus defining a probability space means defining a countably additive nonneg- 
ative measure on a measurable space such that the measure of {2 is equal to one. 
In this form the axiomatics of Probability Theory was formulated by A.N. Kol- 
mogorov. The system of axioms we introduced is incomplete and consistent. 

Constructing a probability space (2, §, P) is the basic stage in creating a math- 
ematical model (formalisation) of an experiment. 

Discussions on what should one understand by probability have a long history 
and are related to the desire to connect the definition of probability with its “phys- 
ical” nature. However, because of the complexity of the latter, such attempts have 
always encountered difficulties not only of mathematical, but also of philosophical 
character (see the Introduction). The most important stages in this discussion are re- 
lated to the names of Borel, von Mises, Bernstein and Kolmogorov. The emergence 
of Kolmogorov’s axiomatics separated, in a sense, the mathematical aspect of the 
problem from all the rest. With this approach, the “physical interpretation” of the 
notion of probability appears in the form of a theorem (the strong law of large num- 
bers, see Chaps. 5 and 7), by virtue of which the relative frequency of the occurrence 
of a certain event in an increasingly long series of independent trials approaches (in 
a strictly defined sense) the probability of this event. 

We now consider examples of the most commonly used measurable and proba- 
bility spaces. 

1. Discrete measurable spaces. These are spaces (2, §) where (2 is a finite or 
countably infinite collection of elements, and the o-algebra ¥ usually consists of 
all the subsets of §2. Discrete probability spaces constructed on discrete measurable 
spaces were studied, with concrete examples, in Chap. 1. 

2. The measurable space (IR, 8), where R is the real line(or a part of it) and 8 
is the o-algebra of Borel sets. The necessity of considering such spaces arises in 
situations where the results of observations of interest may assume any values in R. 


Example 2.1.1 Consider an experiment consisting of choosing a point “at random” 
from the interval [0, 1]. By this we will understand the following. The set of elemen- 
tary outcomes 2 is the interval [0, 1]. The o-algebra § will be taken to be the class 


18 2 An Arbitrary Space of Elementary Events 


of subsets B for which the notion of length (Lebesgue measure) 4(B) is defined— 
for example, the o-algebra 8 of Borel measurable sets. To “conduct a trial” means 
to choose a point w € (2 = [0, 1], the probability of the event w € B being j1(B). All 
the axioms are clearly satisfied for the probability space ([0, 1], 8, 2). We obtain 
the so-called uniform distribution on [0, 1]. 


Why did we take the o-algebra of Borel sets % to be our ¥ in this example? If we 
considered on 92 = [0, 1] the o-algebra generated by “individual” points of the in- 
terval, we would get the sets of which the Lebesgue measure is either 0 or 1. In other 
words, the obtained sets would be either very “dense” or very “thin” (countable), so 
that the intervals (a, b) for 0 < b — a < 1 do not belong to this o-algebra. 

On the other hand, if we considered on (2 = [0, 1] the o-algebra of all subsets of 
2, it would be impossible to define a probability measure on it in such a way that 
P((a, b]) = b —a (ie. to get the uniform distribution)! 

Turning back to the uniform distribution P on (2 = [0, 1], it is easy to see that 
it is impossible to define this distribution using the same approach as we used to 
define a probability on a discrete space of elementary events (i.e. by defining the 
probabilities of elementary outcomes @). Since in this example the ws are individual 
points from [0, 1], we clearly have P(w) = 0 for any ow. 

3. The measurable space (IR", 8") is used in the cases when observations are 
vectors. Here R” is the n-dimensional Euclidean space(R” = R; x --- x R”, where 
R,,..., Ry, are 1 copies of the real line), 8” is the o-algebra of Borel sets in R”, 
i.e. the o-algebra generated by the sets B = B, x --- x B”, where B; C R; are Borel 
sets on the line. Instead of IR” we could also consider some measurable part 2 € 8” 
(for example a cube or ball), and instead of 8” the restriction of 8” onto 22. Thus, 
similarly to the last example one can construct a probability space for choosing a 
point at random from the cube (2 = [0, 1]”. We put here P(w € B) = (B), where 
t(B) is the Lebesgue measure (volume) of the set B. Instead of the cube [0, 1]” we 
could consider any other cube, for example [a, b]”, but in this case we would have 
to put 


P(w € B) = u(B)/u(2) = W(B)/(b— a)". 


This is the uniform distribution on a cube. 

In Probability Theory one also needs to deal with more complex probability 
spaces. What to do if the result of the experiment is an infinite random sequence? In 
this case the space (IR, 8°) is often the most appropriate one. 

4. The measurable space (R™, 8°), where 


'See e.g. [28], p. 80. 


2.1. The Axioms of Probability Theory. A Probability Space 19 


is the space of all sequences (x1, x2, ...) (the direct product of the spaces IR;), and 
38° the o-algebra generated by the sets of the form 


N 
(IIs) x ( I] R)); Bi, € Bj, 
k=1 


JF ik 
k<N 


for any N, j,..., jn, where 8; is the o-algebra of Borel sets from R;. 

5. If an experiment results, say, in a continuous function on the interval [a, b] 
(a trajectory of a moving particle, a cardiogram of a patient, etc.), then the probabil- 
ity spaces considered above turn out to be inappropriate. In such a case one should 
take {2 to be the space C(a, b) of all continuous functions on [a, b] or the space 
IR!" of all functions on [a, b]. The problem of choosing a suitable o-algebra here 
becomes somewhat more complicated and we will discuss it later in Chap. 18. 

Now let us return to the definition of a probability space. 

Let a triple (§2,.A,P) be a wide-sense probability space (A is an algebra). As 
we have already seen, to each algebra A there corresponds a o-algebra § = o (A) 
generated by A. The following question is of substantial interest: does the proba- 
bility measure P on A define a measure on § = o(A)? And if so, does it define 
it in a unique way? In other words, to construct a probability space ({2,.A, P), is 
it sufficient to define the probability just on some algebra A generating ¥§ (i.e. to 
construct a wide-sense probability space (2, A, P), where o(A) = §)? An answer 
to this important question is given by the Carathéodory theorem. 


The measure extension theorem Let ((2,.A, P) be a wide-sense probability space. 
Then there exists a unique probability measure Q defined on § = 0 (A) such that 


Q(A)=P(A) forall Ac A. 


Corollary 2.1.1 Any wide-sense probability space (S82, A, P) automatically defines 
a probability space (82, §, P) with § =o (A). 


We will make extensive use of this fact in what follows. In particular, it implies 
that to define a probability measure on the measurable space (R, 8), it suffices to 
define the probability on intervals. 

The proof of the Carathéodory theorem is given in Appendix 1. 

In conclusion of this section we will make a general comment. Mathematics dif- 
fers qualitatively from such sciences as physics, chemistry, etc. in that it does not 
always base its conclusions on empirical data with the help of which a naturalist 
tries to answer his questions. Mathematics develops in the framework of an initial 
construction or system of axioms with which one describes an object under study. 
Thus mathematics and, in particular, Probability Theory, studies the nature of the 
phenomena around us in a methodologically different way: one studies not the phe- 
nomena themselves, but rather the models of these phenomena that have been cre- 
ated based on human experience. The value of a particular model is determined by 


20 2 An Arbitrary Space of Elementary Events 


the agreement of the conclusions of the theory with our observations and therefore 
depends on the choice of the axioms characterising the object. 

In this sense axioms P1, P2, and the additivity of probability look indisputable 
and natural (see the remarks in the Introduction on desirable properties of probabil- 
ity). Countable additivity of probability and the property A2’ of o-algebras are more 
delicate and less easy to intuit (as incidentally are a lot of other things related to the 
notion of infinity). Introducing the last two properties was essentially brought about 
by the possibility of constructing a meaningful mathematical theory. Numerous ap- 
plications of Probability Theory developed from the system of axioms formulated 
in the present section demonstrate its high efficiency and purposefulness. 


2.2 Properties of Probability 


1. P(@) = 0. This follows from the equality @ + $2 = Q and properties P2 and P3 
of probability. 

2. P(A) = 1—P(A), since A+ A=Q and ANA=B@. 

3. If A C B, then P(A) < P(B). This follows from the relation P(A) + P(AB) = 
P(B). 

4. P(A) < | (by properties 3 and P2). 

5. P(A U B) = P(A) + P(B) — P(AB), since AU B= A+ (B — AB) and P(B — 
AB) = P(B) — P(AB). 

6. P(A U B) < P(A) + P(B) follows from the previous property. 

7. The formula 


r( U 4) =) /P(Ag) — S) P(ARAD 
j=l k=1 k<l 


+ > P(AGAIAm) = +++ (-1)"1 P(A... An) 


k<l<m 


has already been proved and used for discrete spaces §2. Here the reader can prove 
it in exactly the same way, using induction and property 5. 

Denote the sums on the right hand side of the last formula by Z1, Z2,..., Zn, 
respectively. Then statement 7 for the event B, = Uj=1 Aj; can be rewritten as 


P(Bn) = Via (- DI" Z;. 
8. An important addition to property 7 is that the sequence via b/ 17 j 
approximates P(B,) by turns from above and from below as k grows, i.e. 
2k—-1 
P(Bn) — >) (-L/'Z; <0, 
j=l 
(2.2.1) 
2k 
P(B,)— ) (-1)1Z; 20, k=1,2,... 
j=l 


2.3 Conditional Probability. Independence of Events and Trials 21 


This property can also be proved by induction on n. For n = 2 this property is 
ascertained in 5. Let (2.2.1) be valid for any events Aj,..., Ay—1 (i.e. for any By,_1). 
Then by 5 we have 


P(Bn) = P(Bn—1 U An) = P(Bn—1) + P(An) — r( U aA), 


where, in view of (2.2.1) for k = 1, 


n=1 n—-1 


>> P(A; Pea, Aj) S P(Bn-1) S )_ P(A)), 


j=! i<j j=l 


n—-1 
(Ua as) yn, A: 


Hence, for B, = By—1 U An, we get 


P(Bn) < ))P(Aj). 
j=l 
P(Bn) = P(Bn-1) + P(A,) _ P(Bn—1 An) 


n—-1 


=r )- SO P(A; A; p- Lr An )= DPA) — Pa Aj). 


i<j i<j 


This proves (2.2.1) for k = 1. Fork =2,3,... the proof is similar. 
9. If An is a monotonically increasing sequence of sets (i.e. Ay C An+1) and 
A=Use, An, then 


P(A) = lim P(A,). (2.2.2) 
noo 
This is a different form of the continuity axiom equivalent to P3’. 


Indeed, introducing the sets B, = A— An, we get B,+1 C By, and (ie 1 Br =. 
Therefore, by the continuity axiom, 


P(A — Ay) = P(A) — P(A,) > 0 


as n — oo. The converse assertion that (2.2.2) implies the continuity axiom can be 
obtained in a similar way. 


2.3 Conditional Probability. Independence of Events and Trials 


We will start with examples. Let an experiment consist of three tosses of a fair 
coin. The probability that heads shows up only once, i.e. that one of the elementary 


22 2 An Arbitrary Space of Elementary Events 


events htt, tht, or tth occurs, is equal in the classical scheme to 3/8. Denote 
this event by A. Now assume that we know in addition that the event B = 
{the number of heads is odd} has occurred. 

What is the probability of the event A given this additional information? The 
event B consists of four elementary outcomes. The event A is constituted by three 
outcomes from the event B. In the framework of the classical scheme, it is natural 
to define the new probability of the event A to be 3/4. 

Consider a more general example. Let a classical scheme with n outcomes be 
given. An event A consists of r outcomes, an event B of m outcomes, and let the 
event AB have k outcomes. Similarly to the previous example, it is natural to define 
the probability of the event A given the event B has occurred as 


k k/n 
P(A|B) = — = : 
m min 
The ratio is equal to P(A B)/P(B), for 
k m 
P(A| B)=-, P(B) = —. 
n n 


Now we can give a general definition. 


Definition 2.3.1 Let (2, %,P) be a probability space and A and B be arbitrary 
events. If P(B) > 0, the conditional probability of the event A given B has occurred 
is denoted by P(A|B) and is defined by 


Definition 2.3.2 Events A and B are called independent if 
P(AB) = P(A) P(B). 


Below we list several properties of independent events. 
1. If P(B) > 0, then the independence of A and B is equivalent to the equality 


P(A|B) = P(A). 


The proof is obvious. _ 
2. If A and B are independent, then A and B are also independent. 
Indeed, 


P(AB) = P(B — AB) 
= P(B) — P(AB) = P(B)(1 — P(A)) = P(A)P(B). 


3. Let the events A and B, and the events A and B> each be independent, and 
assume B, By = @. Then the events A and B, + B>2 are independent. 


2.3 Conditional Probability. Independence of Events and Trials 23 


Fig. 2.1 Illustration to A 
Example 2.3.2: the dashed 
rectangles represent the 
events A and B 


The property is proved by the following chain of equalities: 


P(A(B, + B2)) = P(AB, + AB2) = P(AB)) + P(AB2) 
= P(A)(P(B1) + P(B2)) = P(A)P(B) + Bo). 


As we will see below, the requirement B; Bz = @ is essential here. 


Example 2.3.1 Let event A mean that heads shows up in the first of two tosses of a 
fair coin, and event B that tails shows up in the second toss. The probability of each 
of these events is 1/2. The probability of the intersection AB is 


P(AB) = ; = —-—=P(A)P(B). 


Therefore the events A and B are independent. 


Example 2.3.2 Consider the uniform distribution on the square [0, 1]? (see Sect. 2.1). 
Let A be the event that a point chosen at random is in the region on the right of an 
abscissa a and B the event that the point is in the region above an ordinate b. 

Both regions are hatched in Fig. 2.1. The event AB is squared in the figure. 
Clearly, P(AB) = P(A)P(B), and hence the events A and B are independent. 

It is also easy to verify that if B is the event that the chosen point is inside the 
triangle FCD (see Fig. 2.1), then the events A and B will already be dependent. 


Definition 2.3.3 Events B,, Bo,..., By, are jointly independent if, for any 1 < i, < 
ig <:+++ <i, <n,r=2,3,...,n, 


r( () ®, = | [P@iv). 
k=1 k=1 


Pairwise independence is not sufficient for joint independence of n events, as one 
can see from the following example. 


24 2 An Arbitrary Space of Elementary Events 


Example 2.3.3 (Bernstein’s example) Consider the following experiment. We roll a 
symmetric tetrahedron of which three faces are painted red, blue and green respec- 
tively, and the fourth is painted in all three colours. Event R means that when the 
tetrahedron stops, the bottom face has the red colour on it, event B that it has the 
blue colour, and G the green. Since each of the three colours is present on two faces, 
P(R) = P(B) = P(G) = 1/2. For any two of the introduced events, the probability 
of the intersection is 1/4, since any two colours are present on one face only. Since 
q7=3* 5 this implies the pairwise independence of all three events. However, 


P(RGB) = ; # P(R)P(B)P(G) = 1/8. 


Now it is easy to construct an example in which property 3 of independent events 
does not hold when B, Bo 4 ©. 

An example of a sequence of jointly independent events is given by the series of 
outcomes of trials in the Bernoulli scheme. 

If we assume that each outcome was obtained as a result of a separate trial, then 
we will find that any event related to a fixed trial will be independent of any event 
related to other trials. In such cases one speaks of a sequence of independent trials. 

To give a general definition, consider two arbitrary experiments G; and G2 and 
denote by (21, §1, P1) and (22, §2, P2) the respective probability spaces. Consider 
also the “compound” experiment G with the probability space (2, %,P), where 
§2 = 82; x {27 is the direct product of the spaces $2; and S22, and the o-algebra § is 
generated by the direct product 1 x §2 (i.e. by the events B = By x Bz, By € §, 
Bo € §2). 


Definition 2.3.4 We will say that the trials G,; and Gy are independent if, for any 
B= B, x Bo, By € $1, Bo € $2 one has 


P(B) = P) (B))P2(B2) = PCB, x £22) P(Q2, x Bo). 


Independence of n trials G,,..., Gy is defined in a similar way, using the equal- 
ity 
P(B) =P (B1)---Pu(Bn), 


where B = Bi x--- X Bn, Be € Fx, and (2x, Fx, Px) is the probability space corre- 
sponding to the experiment G;,k=1,...,n. 

In the Bernoulli scheme, the probability of any sequence of outcomes consisting 
of r zeros and ones and containing k ones is equal to p*(1 — p)’~*. Therefore the 
Bernoulli scheme may be considered as a result of r independent trials in each of 
which one has | (success) with probability p and 0 (failure) with probability 1 — p. 
Thus, the probability of k successes in r independent trials equals (7) rip 

The following assertion, which is in a sense converse to the last one, is also 
true: any sequence of identical independent trials with two outcomes makes up a 
Bernoulli scheme. 

In Chap. 3 several remarks will be given on the relationship between the notions 
of independence we introduced here and the common notion of causality. 


2.4 The Total Probability Formula. The Bayes Formula 25 
2.4 The Total Probability Formula. The Bayes Formula 


Let A be an event and B,, Bo,..., B, be mutually exclusive events having positive 
probabilities such that 


n 
AC |) Bp 
j=1 


The sequence of events B,, Bz, ... can be infinite, in which case we put n = oo. The 
following total probability formula holds true: 


P(B)= S > P(B;)P(A|B)). 
j=l 


Proof It follows from the assumptions that 
n 
A=(JBj)A. 
j=l 
Moreover, the events AB,, ABo,..., AB, are disjoint, and hence 


P(A) =) P(ABj) = 9 P(B;)P(AB)). 


j=l j=l 


Example 2.4.1 In experiments with colliding electron-positron beams, the probabil- 
ity that during a time unit there will occur j collisions leading to the birth of new 
elementary particles is equal to 


where A is a positive parameter (this is the so-called Poisson distribution, to be con- 
sidered in more detail in Chaps. 3, 5 and 19). In each collision, different groups of 
elementary particles can appear as a result of the interaction, and the probability of 
each group is fixed and does not depend on the outcomes of other collisions. Con- 
sider one such group, consisting of two jz-mesons, and denote by p the probability 
of its appearance in a collision. What is the probability of the event Ax that, during 
a time unit, k pairs of jz-mesons will be born? 

Assume that the event B; that there were j collisions during the time unit has 
occurred. Given this condition, we will have a sequence of j independent trials, and 
the probability of having k pairs of 4-mesons will be (2) p*(1 — p)/-*. Therefore 
by the total probability formula, 


[o,e) [o,e) 


P(Ag) = >) P(Bj)P(Ak| Bj) = >, 


j=k jak 


eri jt 


iG ape =P 


26 2 An Arbitrary Space of Elementary Events 


e* pk (ACL — p))) ee *P (ap) 
! | ~ ! a 
k! ia J! k! 


Thus we again obtain a Poisson distribution, but this time with parameter Ap. 

The solution above was not formalised. A formal solution would first of all 
require the construction of a probability space. The space turns out to be rather 
complex in this example. Denote by 92; the space of elementary outcomes in the 
Bernoulli scheme corresponding to j trials, and let w; denote an element of 2;. 
Then one could take {2 to be the collection of all pairs {(j, a Do» where the 
number j indicates the number of collisions, and w; is a sequence of “successes” 
and “failures” of length j (“success” stands for the birth of two jz-mesons). If w; 
contains k “successes”, one has to put 


P((j,@;)) = pjp<(— p)i*. 


To get P(A,), it remains to sum up these probabilities over all w; containing k 
successes and all j > k (the idea of the total probability formula is used here tacitly 
when splitting A, into the events (j, w;)). 

The fact that the number of collisions is described here by a Poisson distribution 
could be understood from the following circumstances related to the nature of the 
physical process. Let B;(t, u) be the event that there were j collisions during the 
time interval [t,t + uw). Then it turns out that: 


(a) the pairs of events B;(v,t) and B,(v +t, u) related to non-overlapping time 
intervals are independent for all v,t,u, 7, and k; 
(b) for small A the probability of a collision during the time A is proportional to A: 


P(Bi(t, A)) =AA + (A), 


and, moreover, P(Bx(t, A)) = o(A) for k > 2. 


Again using the total probability formula with the hypotheses B;(v, t), we obtain 
for the probabilities pz (t) = P(Bx(v, t)) the following relations: 


k 
pe(t + A) = )~ pj(t)P(Be(v,t + A) | Bj(v, 1) 


j=0 
k 
= \> pj (t)P(Be_j(v +t, A)) = 0(A) + pei) (AA + 0(A)) 
j=0 


= px(t(1-AA—0(A)), k=; 
po(t + A) = po(t)(1— 2A — o(A)). 
Transforming the last equation, we find that 


po(t + A) — pot) _ 
a = 


Apo(t) + o(1). 


2.4 The Total Probability Formula. The Bayes Formula 27 


Therefore the derivative of po exists and is given by 


Po(t) = —Apo(t). 


In a similar way we establish the existence of 


p(t) =Ape-1() —Ape(t), «=k > 1. (2.4.1) 


Now note that since the functions p;(t) are continuous, one should put po(0) = 1, 
px(O) = 0 for k > 1. Hence 


Po(t) = et, 


k-1,-Aa 
Using induction and substituting into (2.4.1) the function px_—1(t) = One, we 


establish (it is convenient to make the substitution pz = e~u,, which turns (2.4.1) 


: — acaryk-} 
into u, = eopr ) that 


(tke 


ki j = 0, 1y34 


PK(t) = 
This is the Poisson distribution with parameter At. 

To understand the construction of the probability space in this problem, one 
should consider the set 2 of all non-decreasing step-functions x(t) > 0, t > 0, tak- 
ing values 0,1,2,.... Any such function can play the role of an elementary out- 
come: its jump points indicate the collision times, the value x(t) itself will be the 
number of collisions during the time interval (0, t). To avoid a tedious argument re- 
lated to introducing an appropriate o-algebra, for the purposes of our computations 
we could treat the probability as given on the algebra A (see Sect. 2.1) generated 
by the sets {x(t) =k}, t > 0; k =0, 1, ... (note that all the events considered in this 
problem are just of such form). The above argument shows that one has to put 


Z (At)ke—™# 


P(x(v +t) —x(v) =k) xl 


(See also the treatment of Poisson processes in Chap. 19.) 


By these examples we would like not only to illustrate the application of the total 
probability formula, but also to show that the construction of probability spaces in 
real problems is not always a simple task. 

Of course, for each particular problem, such constructions are by no means nec- 
essary, but we would recommend to carry them out until one acquires sufficient 
experience. 

Assume that events A and B,,..., B, satisfy the conditions stated at the begin- 
ning of this section. If P(A) > 0, then under these conditions the following Bayes’ 
formula holds true: 


P(B;)P(A|B;) 
yi) P(Be)P(A| Bre) 


P(B;|A) = 


28 2 An Arbitrary Space of Elementary Events 


This formula is simply an alternative way of writing the equality 


P(B;A) 
P(A) 


P(B;|A) = 


where in the numerator one should make use of the definition of conditional prob- 
ability, and in the denominator, the total probability formula. In Bayes’ formula we 
can take n = oo, just as for the total probability formula. 


Example 2.4.2 An item is manufactured by two factories. The production volume 
of the first factory is k times the production of the second one. The proportion of 
defective items for the first factory is P;, and for the second one P2. Now assume 
that the items manufactured by the factories during a certain time interval were 
mixed up and then sent to retailers. What is the probability that you have purchased 
an item produced by the second factory given the item proved to be defective? 

Let B, be the event that the item you have got came from the first factory, and 
B> from the second. It easy to see that 


P(Bi) = P(B2) 


1 k 
1+k’ +k 
These are the so-called prior probabilities of the events B, and Bz. Let A be the 
event that the purchased item is defective. We are given conditional probabilities 
P(A|B,) = P; and P(A|B2) = P2. Now, using Bayes’ formula, we can answer the 
posed question: 


k 
Py kP 
= +k = 2 

Tee Pl Tyg! 2 1 2 


Similarly, P(B,|A) = ee 


The probabilities P(B,| A) and P(B2| A) are sometimes called posterior proba- 
bilities of the events B; and B> respectively, after the event A has occurred. 


Example 2.4.3 A student is suggested to solve a numerical problem. The answer to 
the problem is known to be one of the numbers 1, ...,k. Solving the problem, the 
student can either find the correct way of reasoning or err. The training of the student 
is such that he finds a correct way of solving the problem with probability p. In 
that case the answer he finds coincides with the right one. With the complementary 
probability | — p the student makes an error. In that case we will assume that the 
student can give as an answer any of the numbers 1,..., k with equal probabilities 
1/k. 

We know that the student gave a correct answer. What is the probability that his 
solution of the problem was correct? 

Let B, (Bz) be the event that the student’s solution was correct (wrong). 
Then, by our assumptions, the prior probabilities of these events are P(B,) = p, 


2.4 The Total Probability Formula. The Bayes Formula 


P(B2) = 1 — p. If the event A means that the student got a correct answer, then 
P(A|B,) = 1, P(A| Bo) = 1/k. 
By Bayes’ formula the desired posterior probability P(B,| A) is equal to 


P(B,)P(A| Bi) eee ne 
P(B,)P(A[Bi) + P(B2)P(A|B2) p+ FP 1 4 EP 


P(Bi|A) = 


Clearly, P(B,|A) > P(B\) = p and P(B;|A) is close to | for large k. 


29 


Chapter 3 
Random Variables and Distribution Functions 


Abstract Section 3.1 introduces the formal definitions of random variable and its 
distribution, illustrated by several examples. The main properties of distribution 
functions, including a characterisation theorem for them, are presented in Sect. 3.2. 
This is followed by listing and briefly discussing the key univariate distributions. 
The second half of the section is devoted to considering the three types of distri- 
butions on the real line and the distributions of functions of random variables. In 
Sect. 3.3 multivariate random variables (random vectors) and their distributions are 
introduced and discussed in detail, including the two key special cases: the multi- 
nomial and the normal (Gaussian) distributions. After that, the concepts of indepen- 
dence of random variables and that of classes of events are considered in Sect. 3.4, 
establishing criteria for independence of random variables of different types. The 
theorem on independence of sigma-algebras generated by independent algebras of 
events is proved with the help of the probability approximation theorem. Then the 
relationships between the introduced notions are extensively discussed. In Sect. 3.5, 
the problem of existence of infinite sequences of random variables is solved with 
the help of Kolmogorov’s theorem on families of consistent distributions, which is 
proved in Appendix 2. Section 3.6 is devoted to discussing the concept of integral in 
the context of Probability Theory (a formal introduction to Integration Theory is pre- 
sented in Appendix 3). The integrals of functions of random vectors are discussed, 
including the derivation of the convolution formulae for sums of independent ran- 
dom variables. 


3.1 Definitions and Examples 


Let (2, §, P) be an arbitrary probability space. 


Definition 3.1.1 A random variable & is a measurable function € = &(w) mapping 
(92, §) into (IR, 8), where R is the set of real numbers and 9S is the o-algebra of all 
Borel sets, i.e. a function for which the inverse image é(-)D(B) = {w:&(w) € B} of 
any Borel set B € S is a set from the o-algebra ¥. 


A.A. Borovkov, Probability Theory, Universitext, 31 
DOI 10.1007/978-1-4471-5201-9_3, © Springer-Verlag London 2013 


32 3 Random Variables and Distribution Functions 


For example, when tossing a coin once, {2 consists of two points: heads and tails. 
If we put | in correspondence to heads and 0 to tails, we will clearly obtain a random 
variable. 

The number of points showed up on a die will also be a random variable. 

The distance between the origin to a point chosen at random in the square [0 < 
x <1,0< y <1] will also be a random variable, since the set {(x, y): xe y? < th} 
is measurable. The reader might have already noticed that in these examples it is 
very difficult to come up with a non-measurable function of w which would be re- 
lated to any real problem. This is often the case, but not always. In Chap. 18, devoted 
to random processes, we will be interested in sets which, generally speaking, are not 
events and which require special modifications to be regarded as events. 

As we have already mentioned above, it follows from the definition of a random 
variable that, for any set B from the o-algebra 5 of Borel sets on the real line, 


EC) (B) = {w:&(@) € B} EF. 


Hence one can define a probability F;(B) = P(é € B) on the measurable space 
(IR, 8) which generates the probability space (IR, 8, Fs). 


Definition 3.1.2 The probability F:(B) is called the distribution of the random 
variable &. 


Putting B = (—ov, x) one obtains the function 
Fy (x) = Fz (—00, x) = P(E < x) 


defined on the whole real line which is called the distribution function! of the ran- 
dom variable é. 

We will see below that the distribution function of a random variable completely 
specifies its distribution and is often used to describe the latter. 

Where it leads to no confusion, we will write just F, F(x) instead of Fz, Fz (x), 
respectively. More generally, in what follows, as a rule, we will be using boldface 
letters F, G, I, ®, K, II, etc. to denote distributions, and the standard font letters F, 
G,1I, @,... to denote the respective distribution functions. 

Since a random variable € is a mapping of 2 into R, one has P(|&| < oo) = 1. 
Sometimes, it is also convenient to consider along with such random variables ran- 
dom variables which can assume the values too (they will be measurable map- 
pings of £2 into RU {+00}). If P(|E| = oo) > 0, we will call such random variables 
&(w@) improper. Each situation where such random variables appear will be explic- 
itly noted. 


Example 3.1.1 Consider the Bernoulli scheme with success probability p and sam- 
ple size k (see Sect. 3.3). As we know, the set of elementary outcomes {2 in this case 


'In the English language literature, the distribution function is conventionally defined as Fe(x) = 
P(é <x). The only difference is that, with the latter definition, F will be right-continuous, cf. 
property F3 below. 


3.2 Properties of Distribution Functions. Examples 33 


is the set of all k-tuples of zeros and ones. Take the o-algebra § to be the system of 
all subsets of §2. Define a random variable on (2 as follows: to each k-tuple of zeros 
and ones we relate the number of ones in this tuple. 

The probability of r successes is, as we already know, 


k r k-r 
PU, ky= (‘)e a ed 2) ae 


Therefore the distribution function F(x) of our random variable will be defined 
as 


PO)=)_ PE: 
r<x 
Here the summation is over all integers r which are less than x. If x < 0 then 
F(x) =0, and if x > k then F(x) = 1. 


Example 3.1.2 Suppose we choose a point at random from the segment [a, b], 1.e. 
the probability that the chosen point is in a subset of [a, b] is taken to be proportional 
to the Lebesgue measure of this subset. Here, 92 is the segment [a, b], the o-algebra 
& is the class of Borel subsets of [a, b]. Define a random variable € by 


E(@)=wo, wé[a,b], 


i.e. the value of the random variable is equal to the number from [a, b] we have cho- 
sen. It is a measurable function. If x <a, then F(x) = P(é < x) =0. Let x € (a, DI. 
Then {€ < x} means that the point is in the interval [a, x). The probability of this 
event is proportional to the length of the interval, hence 


x-—a 
F(x)=P(E<x)= 
b-a 
If x > b, then clearly F(x) = 1. Finally, we find that 
0, x <a, 
F(x)=) jp ASX), (3.1.1) 
1, x >b. 


This distribution function defines the so-called uniform distribution on the interval 
[a, b]. 

If w(B) is the Lebesgue measure on (R, 8), then, as we will see in the next 
section, it is not hard to show that in this case F;(B) = u(B 1 [a, b])/(b — a). 


3.2 Properties of Distribution Functions. Examples 


3.2.1 The Basic Properties of Distribution Functions 


Let F(x) be the distribution function of a random variable €. Then F(x) has the 
following properties: 


34 3 Random Variables and Distribution Functions 


Fl. Monotonicity: if x1 < x2, then F(x) < F(x2). 
F2. limy—+—oo F(x) = 0 and limy-.4 F(x) = 1. 
F3. Left-continuity: limy}x) F (x) = F (xo). 


Proof Since for x; < x2 one has {€ < x;} C {€ < x2}, Fl immediately follows from 
property 3 of probability (see Sect. 3.2.2). 

To prove F2, consider two number sequences {x,} and {y,} such that {x,} is 
decreasing and x,, > —oo, while {y,} is increasing and y, — oo. Put Ay = {& < xy} 
and B, = {& < y,}. Since x, tends monotonically to —oo, the sequence of sets Ay 
decreases monotonically to (] An = @. By the continuity axiom (see Sect. 3.2.1), 
P(A,,) > 0 as n > of or, which is the same, lim, F(x,) = 0. This and the 
monotonicity of F(x) imply that 


lim F(x) =0. 
x——0O 
Since the sequence {y,} tends monotonically to oo, the sequence of sets B, in- 


creases to (J By = 92, and hence (see property 9 in Sect. 3.2.2) P(Bn) > 1. This 
implies, as above, that 


lim FOn)=1, lim F(x) =1. 
noo X00 
Property F3 is proved in a similar way. Let {x,} be an increasing sequence with 
Xn t X0, 
A={E <xXo}, An ={& < Xp}. 


The sequence of sets A, also increases, and |) A, = A. Therefore, P(A;,) > P(A). 
This means that 


lim F(x) = F (40). 
x4x0 


It is not hard to see that the function F would be right-continuous if we put 
F(x) =P <x). 

With our definition, the function F is generally speaking not right-continuous, 
since by the continuity axiom 


F(x +0) -— F(x) = lim (F(x+2)- Fo) 
noo n 


= lim r(sc<e+t) =r( 
n—->oo n 
=P(E=x). 


This means that F(x) is continuous if and only if P(§ = x) = 0 for any x. Exam- 
ples 3.1.1 and 3.1.2 show that both continuous and discontinuous F(x) are quite 
common. 

From the above relations it also follows that 


P(x <& <y) =Fe([x, y]) = Fy +0) — FQ). 


ee[s«+4)}) 


n=1 


3.2 Properties of Distribution Functions. Examples 35 


Theorem 3.2.1 [fa function F (x) has properties F1, F2 and F3, then there exist a 
probability space (82, §, P) and a random variable & such that Fz (x) = F(x). 


Proof First we construct a probability space (2, §, P). Take 2 to be the real line R, 
§ the o-algebra 8 of Borel sets. As we already know (see Sect. 3.2.1), to construct 
a probability space (R, %, P) it suffices to define a probability on the algebra A 
generated, say, by the semi-intervals of the form [-,-) (then o(A) = 8). An arbitrary 
element of the algebra A has the form of a finite union of disjoint semi-intervals: 


n 
A=Jlai.bi), ai <bi 
Sl 


(the values of a; and b; can be infinite). We define 


n 


P(A) = ) (Fi) — F(@)). 


i=1 


It is absolutely clear that axioms P1 and P2 are satisfied by virtue of Fl and F2. It 
remains to verify the countable additivity, or continuity, of P on the algebra A. Let 
Bn € A, Bn C Bn, (Yeo, Bn = B € A. One has to show that P(B,) > P(B) as 
n — oo or, which is the same, that P(B,, B) > 0 (B, B € A). To this end, it suffices 
to prove that, for any fixed NV, P(B, BCy) — 0, where Cy =[—N, N). Indeed, for 
any given ¢ > 0, by virtue of F2 we can choose an N such that P(Cy) < ¢. Then 


P(B, B Cy) < P(Cy) <eéand 


lim sup P(B, B) < limsupP(B, BCy) +¢. 
noo n—- oo 
Since ¢ is arbitrary, the convergence P(B, BCy) > 0 as n > 00 implies the re- 
quired convergence P(B,, B) > 0. It follows that we can assume that the sets B, are 
bounded (B, C [—N, N) for some N < 00). Moreover, we can assume without loss 
of generality that B is the empty set. 
By the above remarks, B, admits the representation 


kn 

B = n b? k 

n= a; , 9; ), nwo, 
i=l 


where a;’, b? are finite. Further note that, for a given ¢ > 0 and any semi-interval 
[a,b), one can always find an embedded interval [a,b — 6), 5 > 0, such that 
P((a, b — 6)) > P({a, b)) — €. This follows directly from property F3: F(b — 6) > 
F(b) as 6 | 0. Hence, for a given ¢ > 0 and set B,, there exist én >O0,i=1,...,ky, 
such that 


kn 


B, =|_J[a?,b? — 8") C Bn, P(B,,) > P(B,,) — €2-". 


U L 


36 3 Random Variables and Distribution Functions 


Now add the right end points of the semi-intervals to the set B,, and consider the 
closed bounded set 
Kn 
Kn =|J[a?. 6? - 67]. 
i=l 


Clearly, 
oo) 
BER ER. K=()Kn=2, 


P(B, — Kn) = P(Bn Kn) <62™". 


It follows from the relation K = © that K, = ©@ for all sufficiently large n. Indeed, 
all the sets K, belong to the closure [Cy] = [N,—N] which is compact. The sets 


{An =[Cn] — Kn}f2, form an open covering of [Cy], since 


J4n= rewi(U K,) = ren Ky) = [Cy]. 


Thus, by the Heine—Borel lemma there exists a finite subcovering {Ay ae 519 < o, 
such that Gear An =[Cw] or, which is the same, (a K, = @. Therefore 


P(Bny) = (Ms ( a K.)) = P( Ms ( UJ «.)) 
n=1 n=1 


no no uz) 
= r( U tn) < r( U ak.) < Se <€. 
n=1 n=1 n=1 
Thus, for a given ¢ > 0 we found an no (depending on €) such that P(B,,) < e. 
This means that P(B,) — 0 as n > oo. We proved that axiom P3 holds. 
So we have constructed a probability space. It remains to take & to be the identity 
mapping of R onto itself. Then 


Fs (x) = P(E <x) = P(—00, x) = F(a). 


The model of the sample probability space based on the assertion just proved is 
often used in studies of distribution functions. 


Definition 3.2.1 A probability space (2, §, F) is called a sample space for a ran- 
dom variable €(w) if 2 is a subset of the real line R and E(w) =a. 


The probability F = F; is called, in accordance with Definition 3.1.1 from 
Sect. 3.1, the distribution of €. We will write this as 
€ &F. (3.2.1) 


It is obvious that constructing a sample probability space is always possible. It 
suffices to put 2 = R, ¥ = 8, F(B) = P(é € B). For integer-valued variables 


3.2 Properties of Distribution Functions. Examples 37 


€ the space (§2, §) can be chosen in a more “economical” way by taking 2 = 
{ss05 190,45.) 

Since by Theorem 3.2.1 the distribution function F(x) of a random variable € 
uniquely specifies the distribution F of this random variable, along with (3.2.1) we 
will also write € € F. 

Now we will give examples of some of the most common distributions. 


3.2.2 The Most Common Distributions 


1. The degenerate distribution 1,. The distribution I, is defined by 
0 ifacB, 
1 ifag B. 


This distribution is concentrated at the point a: if € @Iy, then P(é = a) = 1. The 
distribution function of I, has the form 


I, (B) = | 


Fy= |) forx <a, 


1 forx >a. 


The next two distributions were described in Examples 3.1.1 and 3.1.2 of 
Sect. 3.1. 
2. The binomial distribution Bi,. By the definition, § € By, (n > 0 is an integer, 


p € (0,1) if PE =k) = (”) pC — p)"*, 0 <k <n. The distribution B}, will be 


denoted by By. 
3. The uniform distribution Ug.p. If § € Ug.p, then 
(BN [a, b]) 
PéEEB)= 
(La, b]) 


where jz is the Lebesgue measure. We saw that this distribution has distribution 
function (3.1.1). 

The next distribution plays a special role in probability theory, and we will en- 
counter it many times. 

4. The normal distribution ®, ,2 (the normal or Gaussian law). We will write 
ESE®@, 2 if 


P(E € B) =, ,2(B) = hi (207) dy, (490) 


1 
ov2n 
The distribution ®, ,2 depends on two parameters: a anda > 0. Ifa =0,o = 1, the 
normal distribution is called standard. The distribution function of ®o,; is equal to 


(x) = ©, ((—c0, x) = ale eo" dy 


The distribution function of ®, ,2 is obviously equal to ®((x — a)/o), so that the 
parameters a and o have the meaning of the “location” and “scale” of the distribu- 
tion. 


38 3 Random Variables and Distribution Functions 


The fact that formula (3.2.2) defines a distribution follows from Theorem 3.2.1 
and the observation that the function ®(x) (or ®((x — a)/o)) satisfies properties 
F1-F3, since ®(—oo) = 0, (co) = 1, and @(x) is continuous and monotone. One 
could also directly use the fact that the integral in (3.2.2) is a countably additive set 
function (see Sect. 3.6 and Appendix 3). 

5. The exponential distribution T ,. The relation  € Py means that € is nonneg- 
ative and 


P(E ¢ B) =Ty(B) = af eo du. 
BN(0,co) 


The distribution function of € € Ty clearly has the form 


pax 
pe<n={! e for x > 0, 


for x <0. 


The exponential distribution is a special case of the gamma distribution T'y , to be 
considered in more detail in Sect. 7.7. 

6. A discrete analogue of the exponential distribution is called the geometric 
distribution. It has the form 


PE=H=(1-p)p*, pel,), k=0,1.... 


7. The Cauchy distribution Ky. As was the case with the normal distribution, 
this distribution depends on two parameters @ and o which are also location and 
scale parameters. If § € Ky, then 


a Lf du 
GEO TG gl+(u-a)/o)’ 


The distribution function K (x) of Ko. is 


K(x) 1 i; * du 
xy)=— —_. 
T Jo 1+ u? 
The distribution function of Ky, is equal to K((x — a)o). All the remarks made 
for the normal distribution continue to hold here. 


Example 3.2.1 Suppose that there is a source of radiation at a point (a,a0), 0 > 0, 
on the plane. The radiation is registered by a detector whose position coincides with 
the x-axis. An emitted particle moves in a random direction distributed uniformly 
over the circle. In other words, the angle 7 between this direction and the vector 
(0, —1) has the uniform distribution U_,;,, on the interval [—z, 2]. Observation 
results are the coordinates &,, &2,... of the points on the x-axis where the particles 
interacted with the detector. What is the distribution of the random variable & = €;? 

To find this distribution, consider a particle emitted at the point (a,o) given 
that the particle hit the detector (i.e. given that n € [—2/2,7/2]). It is clear that 
the conditional distribution of 7 given the last event (of which the probability is 
P(y € [—2/2, 2/2]) = 1/2) coincides with U_z/2,7/2. Since (§ — a)/o = tann, 
one obtains that 


3.2 Properties of Distribution Functions. Examples 39 


PE <x)=P(a+otann < x) 


n 1 x- “) 1 1 x—a@ 
=P < — arctan =.—-+-— arctan ; 
uo oO 2 7 oO 


Recalling that (arctan uw)’ = 1/(1 + u~), we have 
[ du i du W 
arctan. x = — = —~ - =, 
9 1+u? oo ltu2 2 


1 £&-%/2 du x-a 
P(E <x)=— / =K : 
I Joo 1+u2 o 


Thus the coordinates of the traces on the x-axis of the particles emitted from the 
point (@, 7) have the Cauchy distribution Ky ,. 

8. The Poisson distribution T1;,. We will write € & M1) if € assumes nonnegative 
integer values with probabilities 


rm 
P(E =m)=—e"*, A>0, m=0,1,2,... 
m! 
The distribution function, as in Example 3.1.1, has the form of a sum: 


UL 5: 


F = mee ml! e 
(x) {2 


for x > 0, 
for x <0. 


3.2.3 The Three Distribution Types 


All the distributions considered in the above examples can be divided into two types. 


I. Discrete Distributions 


Definition 3.2.2 The distribution of a random variable & is called discrete if & can 
assume only finitely or countably many values x1, x2, ... so that 


Pe=PE=m)>0, Do mal. 
A discrete distribution {p,} can obviously always be defined on a discrete prob- 
ability space. It is often convenient to characterise such a distribution by a table: 


Values X1 X2 XR... 
Probabilities} p1 p2 p3 ... 


The distributions I,, Bi. II,, and the geometric distribution are discrete. The 
derivative of the distribution function of such a distribution is equal to zero every- 
where except at the points x1, .x2,... where F(x) is discontinuous, the jumps being 


F (xz +0) — F(xx) = Pr. 


An important class of discrete distributions is formed by lattice distributions. 


40 3 Random Variables and Distribution Functions 


Definition 3.2.3 We say that random variable & has a lattice distribution with span 
h if there exist a and h such that 


[o@) 


a P(E =a+kh) =1. (3.2.3) 


k=—00 


If h is the greatest number satisfying (3.2.3) and the number a lies in the interval 
[0, 4) then these numbers are called the span and the shift, respectively, of the lattice. 

If a =0 and h = 1 then the distribution is called arithmetic. The same terms will 
be used for random variables. 

Obviously the greatest common divisor (g.c.d.) of all possible values of an arith- 
metic random variable equals 1. 


II. Absolutely Continuous Distributions 


Definition 3.2.4 The distribution F of a random variable é is said to be absolutely 
continuous” if, for any Borel set B, 


F(B) = P(é € B) = f(x)dx, (3.2.4) 
B 
where f(x) >0, f°. f(x)dx =1. 


The function f(x) in (3.2.4) is called the density of the distribution. 

It is not hard to derive from the proof of Theorem 3.2.1 (to be more precise, from 
the theorem on uniqueness of the extension of a measure) that the above definition 
of absolute continuity is equivalent to the representation 


Fs (x) =i f(u) du 


for all x € R. Distribution functions with this property are also called absolutely 
continuous. 


>The definition refers to absolute continuity with respect to the Lebesgue measure. Given a measure 
won (R, 8) (see Appendix 3), a distribution F is called absolutely continuous with respect to 
if, for any B € 8, one has 


F(B) = [ Flzyucdz). 


In this sense discrete distributions are also absolutely continuous, but with respect to the count- 
ing measure m. Indeed, if one puts f (xx) = pe, m(B) = {the number of points from the set 
(x1, X2,...) which are in B}, then 


FB) = Se = DO fox)= f Forman 


xpEB xpEB 


(see Appendix 3). 


3.2 Properties of Distribution Functions. Examples 41 


Fig. 3.1 The plot shows the F(x) 4 
result of the first three steps in Ir -——_—— 
the construction of the Cantor | 
function a 
3/4- itll 
en 
mioritl 
ltt badd 
al rit tit 
=! Hitt oti 
11 Hitt obidd 
Be Tae rit it 
—! till Hitt bid 
en | 
1 it tit Ti) 1 itt — 
1/9 2/9 1/3 2/37/98/9 1 x 


The function f(x) is determined by the above equalities up to its values on a set 
of Lebesgue measure 0. For this function, the relation f(x) = aFG) holds? almost 
everywhere (with respect to the Lebesgue measure). 

The distributions Uz,5, ®, 52, Ke, and Ig are absolutely continuous. The den- 
sity of the normal distribution with parameters wand o is equal to 


1 


200 


pre-e [ey 


$u,02(*) = 


From their definitions, one could easily derive the densities of the distributions U,_p, 
Kyo and Fy as well. The density of Ky, has a shape resembling that of the normal 
density, but with “thicker tails” (it vanishes more slowly as |x| — oo). 

We will say that a distribution F has an atom at point x, if F({x,}) > 0. We saw 
that any discrete distribution consists of atoms but, for an absolutely continuous 
distribution, the probability of hitting a set of zero Lebesgue measure is zero. It 
turns out that there exists yet a third class of distributions which is characterised 
by the negation of both mentioned properties of discrete and absolutely continuous 
distributions. 


Il. Singular Distributions 


Definition 3.2.5 A distribution F is said to be singular (with respect to Lebesgue 
measure) if it has no atoms and is concentrated on a set of zero Lebesgue measure. 


Because a singular distribution has no atoms, its distribution function is continu- 
ous. An example of such a distribution function is given by the famous Cantor func- 
tion of which the whole variation is concentrated on the interval [0, 1]: F(x) =0 
for x < 0, F(x) = 1 for x > 1. It can be constructed as follows (the construction 
process is shown in Fig. 3.1). 


3The assertion about the “almost everywhere” uniqueness of the function f follows from the 
Radon-Nikodym theorem (see Appendix 3). 


42 3 Random Variables and Distribution Functions 


Divide the segment [0, 1] into three equal parts [0, 1/3], [1/3, 2/3], and [2/3, 1]. 
On the inner segment put F(x) = 1/2. The remaining two segments are again di- 
vided into three equal parts each, and on the inner parts one sets F(x) to be 1/4 and 
3/4, respectively. Each of the remaining segments is divided in turn into three parts, 
and F(x) is defined on the inner parts as the arithmetic mean of the two already 
defined neighbouring values of F(x), and so on. At the points which do not belong 
to such inner segments F(x) is defined by continuity. It is not hard to see that the 
total length of such “inner” segments on which F(x) is constant is equal to 


CO 


i a. 1 oe f 4 
se i = ee = =1, 
atotat ae 3 1=273 


so that the function F(x) grows on a set of measure zero but has no jumps. 

From the construction of the Cantor distribution we see that dF (x)/dx = 0 al- 
most everywhere. 

It turns out that these three types of distribution exhaust all possibilities. 

More precisely, there is a theorem belonging to Lebesgue’ stating that any distri- 
bution function F(x) can be represented in a unique way as a sum of three compo- 
nents: discrete, absolutely continuous, and singular. Hence an arbitrary distribution 
function cannot have more than a countable number of jumps (which can also be 
observed directly: we will count all the jumps if we first enumerate all the jumps 
which are greater than 1/2, then the jumps greater than 1/3, then greater than 1/4, 
etc.). This means, in particular, that F(x) is everywhere continuous except perhaps 
at a countable or finite set of points. 

In conclusion of this section we will list several properties of distribution func- 
tions and densities that arise when forming new random variables. 


3.2.4 Distributions of Functions of Random Variables 


For a given function g(x), to find the distribution of g(€) we have to impose some 
measurability requirements on the function. The function g(x) is called Borel if the 
inverse image 


g | (B)={x: g(x) B} 


of any Borel set B is again a Borel set. For such a function g the distribution function 
of the random variable 7 = g(&) equals 


Fece)(x) = P(g(&) < x) =P(E € | (—0v, x)). 


If g(x) is continuous and strictly increasing on an interval (a,b) then, on the 
interval (g(a), g(b)), the inverse function y = g— Ya) is defined as the solution to 


4See Sect. 3.5 in Appendix 3. 


3.2 Properties of Distribution Functions. Examples 43 


the equation g(y) =x. Since g is a monotone mapping we have 


{g) <x}={&<g")}  forx € (g(a), g()). 
Thus we get the following representation for F(z) in terms of Fe: for x € 


(g(a), g(d)), 
Fog) (x) = P(E < g(x) = Fe(¢7'(@)). (3.2.5) 


Putting g = F we obtain, in particular, that if F: is continuous and strictly increas- 
ing on (a, b) and F(a) =0, F(b) = 1 (—a and b may be o) then 


Fr; (gx) =x 


for x €[0, 1] and therefore the random variable n = Fz (&) is uniformly distributed 
over [0, 1]. 


Definition 3.2.6 The quantile transform F‘—))(f) of an arbitrary distribution F 
with the distribution function F(x) is the “generalised” inverse of the function F 


FOY)Vy) — sup{x : F(x) < y} for y € (0, 1]; 
FO) (0) :=inf{x : F(x) > O}. 


In mathematical statistics, the number F‘~!) (y) is called the quantile of order y 
of the distribution F. The function F‘~" has a discontinuity of size b — a at a point 
y if (a, b) is the interval on which F is constant and such that F(x) = y € [0, 1). 

Roughly speaking, the plot of the function F‘~) can be obtained from that of the 
function F(x) on the (x, y) plane in the following way: rotate the (x, y) plane in 
the counter clockwise direction by 90°, so that the x-axis becomes the ordinate axis, 
but the y-axis becomes the abscissa axis directed to the left. To switch to normal 
coordinates, we have to reverse the direction of the new x-axis. 

Further, if x is a point of continuity and a point of growth of the function F (i.e., 
F(x) is a point of continuity of F ‘D) then F Dy) is the unique solution of the 
equation F(x) = y and the equality F(F‘~))(y)) = y holds. 

In some cases the following statement proves to be useful. 


Theorem 3.2.2 Let 7 € Up1. Then, for any distribution F, 
fM eF. 
Proof If F(x) > y then FY) Wy) = sup{u: F(v) < y} <x, and vice versa: if 


F(x) < y then F Diy) > x (recall that F(x) is left-continuous). Therefore the 
following inclusions are valid for the sets in the (x, y) plane: 


{y < F(x)} c {FOP W) <x} c fy < F@)}. 


>For an arbitrary non-decreasing function g, the inverse function g(x) is defined by the equa- 
tion 


g—Yvy) = inf {x :g(x)= y} — sup{x 1 g(x) < y}. 


44 3 Random Variables and Distribution Functions 


Substituting 7 € Up,; in place of y in these relations yields that, for any x, such 
inclusions hold for the respective events, and hence 


P(FO)(n) < x) = P(n < F(x)) = F(x). 


The theorem is proved. 


Thus we have obtained an important method for constructing random variables 
with prescribed distributions from uniformly distributed random variables. For in- 
stance, if 7 € Up,; then € = —(1/a) Inn ET. 

In another special case, when g(x) =a + bx, b > 0, from (3.2.5) we get Face) = 
F; ((x — a)/b). We have already used this relation to some extent when considering 
the distributions ®, ,2 and Ky... 

If a function g is strictly increasing and differentiable (the inverse function g~) 
is defined in this case), and & has a density f(x), then there exists a density for g(&) 
which is equal to 


/ d 
fee) = f(g?) (gO) = F@) a 


where x = g(—"!)(y), y = g(x). A similar argument for decreasing g leads to the 
general formula 


fee) = fr) 


dx 
dy| 
For g(x) =a+ bx, b 40, one obtains 


_ 1 y-a 
Fatbe(y) = atl ) 


3.3 Multivariate Random Variables 


Let &,&,...,&, be random variables given on a common probability space 
(92,8, P). To each mw, these random variables put into correspondence an n- 
dimensional vector €(w) = (&|(@), &2(@), ..., &n(@)). 


Definition 3.3.1 A mapping §2 — R” given by random variables &,, &,..., &, is 
called a random vector or multivariate random variable. 


Such a mapping §2 — R” is a measurable mapping of the space (2, F) into the 
space (IR”, 8”), where 8” is the o-algebra of Borel sets in R”. Therefore, for Borel 
sets B, the function P;(B) = P(é € B) is defined. 


Definition 3.3.2 The function F;(B) is called the distribution of the vector &. 
The function 


Fey .&, (X15 +++, Xn) = PEI <X1,...,€n < Xn) 


3.3. Multivariate Random Variables 45 


is called the distribution function of the random vector (&1,...,&,) or joint distri- 
bution function of the random variables &1, ..., &n. 


The following properties of the distribution functions of random vectors, analo- 
gous to properties F1—F3 in Sect. 3.2, hold true. 

FFI. Monotonicity: “Multiple” differences of the values of the function F¢,__<«,, 
which correspond to probabilities of hitting arbitrary “open at the right” paral- 
lelepipeds, are nonnegative. For instance, in the two-dimensional case this means 
that, for any x; < x2, yj < yo (the points (x1, y,) and (x2, y2) being the “extreme” 
vertices of the parallelepiped), 


Frey &(*2, Y2) — Fey,&(%2, y1) — (Fe1,@ 01, Y2) — Fey,6(1, y1)) = 0. 


This double difference is nothing else but the probability of hitting the “semi-open” 


parallelepiped [x1, x2) x [y1, y2) by &. 
In other words, the differences 


Fe, 6 (t, y2) — Fee (t, y1) for y1 < yo 


must be monotone in f. (For this to hold, the monotonicity of the function 
Fe, & (t, y1) is not sufficient.) 
FF2. The second property can be called consistency. 


Tim Fey g, 01, «++ Xn) = Fey els Xn) 


Xp—>00 
lim Fe,_¢,(41,---5 Xn) =0. 


Xn—>—0o 


FF3. Left-continuity: 


lim Fey. (x1, eee x1) = Fe, .&, (x1, Bats Xn). 
xt too 
That the limits in properties FF2 and FF3 are taken in the last variable is inessential, 
for one can always renumber the components of the vectors. 

One can prove these properties in the same way as in the one-dimensional case. 
As above, any function F(x1,...,Xn) possessing this collection of properties will 
be the distribution function of a (multivariate) random variable. 

As in the one-dimensional case, when considering random vectors € = 
(€1,..., &), we can make use of the simplest sample model of the probability space 
(2, F, P). Namely, let 2 coincide with R” and F = 8” be the o-algebra of Borel 
sets. We will complete the construction of the required probability space if we put 
F(B) = F:(B) = P( € B) for any B € 8”. It remains to define the random vari- 
able as the value of the elementary event itself, ie. to put (~) = w, where w is a 
point in R”. 

It is not hard to see that the distribution function F¢,__~, uniquely determines the 
distribution F; (B). Indeed, F¢,__¢, defines a probability on the o-algebra A gener- 
ated by rectangles {a; < x; < bj; i=1,...,n}. For example, in the two-dimensional 
case 


46 3 Random Variables and Distribution Functions 


Pq; < & <b, a2 < & <b) 
= P(E, < bi, a2 < &2 < bo) — P(E < a1, a2 < &2 < bp) 
= [ Fe, (b1, b2) — Fey,e)(b1, 42) | — [ Fe, (a1, b2) — Fey, (a1, 42) ]. 


But 8” = o (A), and it remains to make use of the measure extension theorem (see 
Sect. 3.2.1). 

Thus from a distribution function Fs, __¢, = F one can always construct a sample 
probability space (IR",°8", Fz) and a random variable &(w) = @ on it so that the 
latter will have the prescribed distribution F;. 

As in the one-dimensional case, we say that the distribution of a random vector 
is discrete if the random vector assumes at most a countable set of values. 

The distribution of a random vector will be absolutely continuous if, for any 
Borel set B Cc R", 


Fy(B)=PE EB) =f flayds, 


where clearly f(x) > 0 and te f(x)dx =1. 
This definition can be replaced with an equivalent one requiring that 


Pal Xn 
Fata (tsa) = f f F(ti,..-,t) dty +++ dtp. (3.3.1) 
—oo —oo 
Indeed, if (3.3.1) holds, we define a countably additive set function 
as) = | fords 
B 


(see properties of integrals in Appendix 3), which will coincide on rectangles 
with F;. Consequently, F;(B) = Q(B). 
The function f(x) is called the density of the distribution of & or density of the 


joint distribution of &|, ..., &). The equality 
0” 
creer Fe, .é, (41, 266) Xn) = Sx, s+ +5 Xn) 
holds for this function almost everywhere. 
If a random vector € has density f(x1,...,x,), then clearly any “subvector” 


(Ek, -.-&&,)s ki <n, also has a density equal (let for the sake of simplicity k; =i, 
i=l,...,s)to 


FBieennste) = ff Fleiss rn) dtest + dXn. 


Let continuously differentiable functions y; = g;(x1,...,X,) be given in a region 
A CR”. Suppose they are univalently resolvable for x1, ..., x,: there exist functions 
x = g\? (1... yn), and the Jacobian J = |0x;/Ay;| #0 in A. Denote by B the 
image of A in the range of (1,..., y,). Suppose further that a random vector § = 
(&1,...,&,) has a density f(x). Then n; = 9; (&,..., &,) will be random variables 
with a joint density which, at a point (1,..., yn) € B, is equal to 


Fn (Vis +++ Yn) = fe, +. XndI II; (3.3.2) 


3.3. Multivariate Random Variables 47 


moreover 
P(E € Ay= | ei nae =) eisai es, 
A B 


= f fal rnddyr---dyy = PCa € BD. (3.3.3) 


This is clearly an extension to the multi-dimensional case of the property of densities 
discussed at the end of Sect. 3.2. Formula (3.3.3) for integrals is well-known in 
calculus as the change of variables formula and could serve as a proof of (3.3.2). 

The distribution F; of a random vector & is called singular if the distribution has 
no atoms (Fs ({x}) = 0 for any x € IR”) and is concentrated on a set of zero Lebesgue 
measure. 

Consider the following two important examples of multivariate distributions (we 
continue the list of the most common distribution from Sect. 3.2). 

9. The multinomial distribution B”. We use here the same symbol B” as we used 
for the binomial distribution. The only difference is that now by p we understand a 
vector p= (P14.+55 Pr)» pj = 0, i pj = 1, which could be interpreted as the 
collection of probabilities of disjoint events Aj, Aj = &. For an integer-valued 
random vector v = (v1,..., v,), we will write v € B if fork = (ki,...,k,), kj = 0, 
ia1 kj =n one has 


Pw=h=_ poops (3.3.4) 


n} 

i!-++k;! 
On the right-hand side we have a term from the expansion of the polynomial (p; + 
-++-++ p,)” into powers of p),..., p,. This explains the name of the distribution. If 
p is a number, then evidently Bi, = Bi, |p)? SO that the binomial distribution is a 
multinomial distribution with r = 2. 

The numbers v; could be interpreted as the frequencies of the occurrence of 
events A; in n independent trials, the probability of occurrence of Aj; in a trial 
being p;. Indeed, the probability of any fixed sequence of outcomes containing 
k,...,k- outcomes Aj,..., A;, respectively, is equal to pi see pe , and the number 
of different sequences of this kind is equal to n!/k,!---k,! (of n! permutations we 
leave only those which differ by more than merely permutations of elements inside 
the groups of ki, ...,k, elements). The result will be the probability (3.3.4). 


Example 3.3.1 The simplest model of a chess tournament with two players could 
be as follows. In each game, independently of the outcomes of the past games, the 
1st player wins with probability p, loses with probability g, and makes a draw with 
probability 1 — p — q. In that case the probability that, in n games, the Ist player 
wins i and loses j games (i + j <7), is 


—_ n! » ao 
POLI = sa ee Pan 
i!jin—i-j)! 
Suppose that the tournament goes on until one of the players wins N games (and 
thereby wins the tournament). If we denote by 7 the duration of the tournament (the 


48 3 Random Variables and Distribution Functions 


number of games played before its end) then 


N-1 N-1 
P(y=n)= >> pn-1;N-1,i)p+ >> p(n-1;i,N - Dg. 
i=0 i=0 


10. The multivariate normal (or Gaussian) distribution ®, 2: Let a = (a1, 
...,@,) be a vector and = lloijll, i, 7 =1,...,7, a symmetric positive definite 
matrix, and A = |la;;|| the matrix inverse to o? = A~!. We will say that a vector 
& = (&1,...,&,) has the normal distribution: § € ®, 2, if it has the density 


VIAT 
(2n)r/ 


Here T denotes transposition: 


1 T 
Qy,62(X) = exp| ha a)A(x — @) | 


xAx? = ) Aj jXjXj. 


It is not hard to verify that 


J e020 xi ---dx,=1 


(see also Sect. 7.6). 


3.4 Independence of Random Variables and Classes of Events 


3.4.1 Independence of Random Vectors 


Definition 3.4.1 Random variables &, ..., &) are said to be independent if 
PE; € By, tees En € Bn) = Pe, € B}) 7: ‘PE, € Bn) (3.4.1) 
for any Borel sets B,,..., By, on the real line. 


One can introduce the notion of a sequence of independent random variables. The 
random variables from the sequence {&,}°° , given on a probability space (2, §, P), 
are independent if (3.4.1) holds for any integer n so that the independence of a 
sequence of random variables reduces to that of any finite collection of random 
variable from this sequence. As we will see below, for a sequence of independent 
random variables, any two events related to disjoint groups of random variables 
from the sequence are independent. 

Another possible definition of independence of random variables follows from 
the assertion below. 


Theorem 3.4.1 Random variables &, ..., & are independent if and only if 
Fé, ..&, (41, +++) Xn) = Fe, (41) +++ Fe, An). 
The proof of the theorem is given in the third part of the present section. 


An important criterion of independence in the case when the distribution of & = 
(€1,..., &) is absolutely continuous is given in the following theorem. 


3.4 Independence of Random Variables and Classes of Events 49 


Theorem 3.4.2 Let random variables &,...,& have densities f\(x),..., fn(X), 
respectively. Then for the independence of &|,...,&, it is necessary and sufficient 
that the vector € = (&1,...,&,) has a density f (x1,...,Xn) which is equal to 


f (01, ---5%n) = fir) ++ fan). 


Thus, if it turns out that the density of € equals the product of densities of &;, that 
will mean that the random variables €; are independent. 

We leave it to the reader to verify, using this theorem, that the components of a 
normal vector (1, ..., &,) are independent if and only if a;; = 0, oj; =O fori F j. 


Proof of Theorem 3.4.2 If the distribution function of the random variable &; is given 
by 


Pla = f fii) dt 


and &; are independent, then the joint distribution function will be defined by the 
formula 


Fey &, (415 +++, Xn) = Fe, (11) +++ Fe, Xn) 


x] Xn 
=f fan [ falta 

x] Xn 
= -f filti) ++ fn(tn) dt -++dtn. 


Conversely, assuming that 


x] 


Fey .&,(X1, +--+, Xn) -|/ 
we come to the equality 


Fey é (X15 +++ Xn) = Fe, (11) -+ + Fe, (Xn). 


The theorem is proved. 


f i Ailiiy=** fil aties dik: 


[ee 


Now consider the discrete case. Assume for the sake of simplicity that the com- 
ponents of € may assume only integral values. Then for the independence of &; it is 
necessary and sufficient that, for all k,,..., kn, 


PE) =k1,..-5€n = kn) = PEt = ky) - + PEn = kn). 


Verifying this assertion causes no difficulties, and we leave it to the reader. 

The notion of independence is very important for Probability Theory and will be 
used throughout the entire book. Assume that we are formalising a practical problem 
(constructing an appropriate probability model in which various random variables 
are to be present). How can one find out whether the random variables (or events) 
to appear in the model are independent? In such situations it is a justified rule to 
consider events and random variables with no causal connection as independent. 


50 3 Random Variables and Distribution Functions 


The detection of “probabilistic” independence in a mathematical model of a 
random phenomenon is often connected with a deep understanding of its physical 
essence. 

Consider some simple examples. For instance, it is known that the probability 
of a new-born child to be a boy (event A) has a rather stable value P(A) = 22/43. 
If B denotes the condition that the child is born on the day of the conjunction of 
Jupiter and Mars, then, under the assumption that the position of the planets does not 
determine individual fates of humans, the conditional probability P(A|B) will have 
the same value: P(A|B) = 22/43. That is, the actual counting of the frequency of 
births of boys under these specific astrological conditions would give just the value 
22/43. Although such a counting might never have been carried out at a sufficiently 
large scale, we have no grounds to doubt its results. 

Nevertheless, one should not treat the connection between “mathematical” and 
causal independence as an absolute one. For instance, by Newton’s law of gravita- 
tion the flight of a missile undoubtedly influences the simultaneous flight of another 
missile. But it is evident that in practice one can ignore this influence. This example 
also shows that independence of events and variables in the concrete and relative 
meaning of this term does not contradict the principle of the universal interdepen- 
dence of all events. 

It is also interesting to note that the formal definition of independence of events or 
random variables is much wider than the notion of real independence in the sense of 
affiliation to causally unrelated phenomena. This follows from the fact that “math- 
ematical” independence can take place in such cases when one has no reason for 
assuming no causal relation. We illustrate this statement by the following example. 
Let 7 be a random variable uniformly distributed over [0, 1]. Then in the expansion 
of 7 into a binary fraction 


the random variables & will be independent (see Example 11.3.1), although they all 
have a related origin. 

One can see that this circumstance only enlarges the area of applicability of all 
the assertions we obtain below under the formal condition of independence.° 

The notion of independence of random variables is closely connected with that 
of independence of o -algebras. 


3.4.2 Independence of Classes of Events 


Let (2, , P) be a probability space and A; and A? classes of events from the o- 
algebra §. 


For a more detailed discussion of connections between causal and probabilistic independence, see 
[24], from where we borrowed the above examples. 


3.4 Independence of Random Variables and Classes of Events 51 


Definition 3.4.2 The classes of events A; and A are said to be independent if, for 
any events A; and A> such that A; € A; and Az € Ag, one has 


P(A} Az) = P(A1)P(A2). 


The following definition introduces the notion of independence of a sequence of 
classes of events. 


Definition 3.4.3 Classes of events {Ay }°° , are independent if, for any collection of 
integers 11,...,Mk, 


k k 
r( () 4.) =| [P«.)) 
j=l j=l 


for any Ay, € An. 


For instance, in a sequence of independent trials, the sub-o-algebras of events 
related to different trials will be independent. The independence of a sequence of 
algebras of events also reduces to the independence of any finite collection of alge- 
bras from the sequence. It is clear that subalgebras of events of independent algebras 
are also independent. 


Theorem 3.4.3 o-algebras 1 and 22 generated, respectively, by independent al- 
gebras of events A; and Az are independent. 


Before proving this assertion we will obtain an approximation theorem which 
will be useful for the sequel. By virtue of the theorem, any event A from the o- 
algebra 21 generated by an algebra A can, in a sense, be approximated by events 
from A. To be more precise, we introduce the “distance” between events defined by 


d(A, B) = P(AB U AB) = P(AB) + P(AB) = P(A — B) + P(B— A). 
This distance possesses the following properties: 
d(A, B) =d(A, B), 
d(A,C) <d(A, B)+d(B,C), 
d(AB,CD) <d(A,C)+d(B, D), 
|P(A) — P(B)| < d(A, B). 


(3.4.2) 


The first relation is obvious. The triangle inequality follows from the fact that 
d(A, C) = P(AC) + P(AC) = P(ACB) + P(ACB) + P(ACB) + P(ACB) 
< P(CB) + P(AB) + P(AB) + P(CB) = d(A, B)+d(B,C). 


The third relation in (3.4.2) can be obtained in a similar way by enlarging events 
under the probability sign. Finally, the last inequality in (3.4.2) is a consequence of 
the relations 


P(A) = P(AB) + P(AB) = P(B) — P(BA) + P(AB). 


52 3 Random Variables and Distribution Functions 


Theorem 3.4.4 (The approximation theorem) Let (92, §,P) be a probability space 
and A the o-algebra generated by an algebra A of events from §. Then, for any 
A € 2, there exists a sequence Ay, € A such that 


lim d(A, A,) =0. (3.4.3) 
noo 


By the last inequality from (3.4.2), the assertion of the theorem means that 
P(A) = limy—.o0 P(An) and that each event A € 21 can be represented, up to a set of 
zero probability, as a limit of a sequence of events from the generating algebra A 
(see also Appendix 1). 


Proof’ We will call an event A € ¥ approximable if there exists a sequence A, € A 
possessing property (3.4.3), te. d(An, A) > 0. 

Since d(A, A) = 0, the class of approximable events 2(* contains A. Therefore 
to prove the theorem it suffices to verify that 21* is a o-algebra. 

The fact that 2* is an algebra is obvious, for the relations A € 2* and 
B € A* imply that A, AU B, AN B € XA. (For instance, if d(A, A,) > O and 
d(B, B,) — 0, then by the third inequality in (3.4.2) one has d(AB, A,B,) < 
d(A, An) + d(B, Bn) — 0, so that AB € 2A*.) 

Now let C = ‘ean Cx where Cx € 2*. Since 2* is an algebra, we have Dy, = 
Ure1 Ce € A*; moreover, 


d(Dn, C) = P(C — Dn) = P(C) — P(Dn) = 0. 


Therefore one can choose A, € A so that d(D,, A,) < 1/n, and consequently by 
virtue of (3.4.2) we have 


d(C, An) < d(C, Dn) + d(Dn, An) > 0. 


Thus C € 2(* and hence 2* forms a o-algebra. The theorem is proved. 


Proof of Theorem 3.4.3 is now easy. If A, € 2; and Az € 22, then by Theorem 3.4.4 
there exist sequences Aj, € A; and Az, € A2 such that d(A;, Ain) > 0 asn > ov, 
i= 1,2. Putting B= A; A2 and By, = AjyA2n, we obtain that 


d(B, By) < d(A1, Ain) + d(A2, Arn) > 0 


as n — oo and 


P(A} Az) = lim P(B,) = lim P(Ajn)P(A2n) = P(A1)P(A2). 
noo noo 


3.4.3 Relations Between the Introduced Notions 


We will need one more definition. Let € be a random variable (or vector) given on a 
probability space (2, §, P). 


7The theorem is also a direct consequence of the lemma from Appendix 1. 


3.4 Independence of Random Variables and Classes of Events 53 


Definition 3.4.4 The class §; of events from ¥ of the form A = & (B) = 
{w:&(w) € B}, where B are Borel sets, is called the o-algebra generated by the 
random variable &. 


It is evident that §¢ is a o-algebra since to each operation on sets A there corre- 
sponds the same operation on the sets B = (A) forming a o-algebra. 

The o-algebra $< generated by the random variable & will also be denoted by 
o(&). 

Consider, for instance, a probability space (2, 8, P), where {2 = R is the real 
line and 8 is the o-algebra of Borel sets. If 


0, w<0O, 


§=E() = ic ap. 
then ¥¢ clearly consists of four sets: R, 2, {w < 0} and {w > O}. Such a random 
variable € cannot distinguish “finer” sets from %8. On the other hand, it is obvious 
that € will be measurable ({€ € B} € %8,) with respect to any other “richer” sub-o- 
algebra 81, such that o (€) CB, CB. 

If € = €(w) = |@| is the integral part of w, then §¢ will be the o-algebra of sets 
composed of the events {k <w<k+1},k=...,-1,0,1,... 

Finally, if §(@) = g(@) where ¢ is continuous and monotone, g(oo) = oo and 
y(—oo) = —oo, then §¢ coincides with the o-algebra of Borel sets B. 


Lemma 3.4.1 Let & and n be two random variables given on (92, §, P), the variable 
& being measurable with respect to o(n). Then € and n are functionally related, i.e. 
there exists a Borel function g such that € = g(n). 


Proof By assumption, 


k k+1 
Aun={6 E = ) a(n). 


Denote by By n = {n(@) : @ € Ax.n} the images of the sets Az, on the line R under 
the mapping 7(@) and put g,(x) = k/2” for x € By». Then gy(y) = [2”¢]/2” and 
because Ag.n € 0()), Ben € B and gy is a Borel function. Since gy (x) + for any x, 
the limit limy—oo gn(x) = g(x) exists and is also a Borel function. It remains to 
observe that ¢ = limyp— oo 8n(7) = g(n) by the very construction. 


Now we formulate an evident proposition relating independence of random vari- 
ables and o-algebras. 

Random variables &,,...,&, are independent if and only if the o-algebras 
o(&1),...,0(E,) are independent. 

This is a direct consequence of the definitions of independence of random vari- 
ables and o-algebras. 

Now we can prove Theorem 3.4.1. First note that finite unions of semi-intervals 
[-,-) (perhaps with infinite end points) form a o-algebra generating the Borel o-alge- 
bra on the line: 8 = 0 (A). 


54 3 Random Variables and Distribution Functions 


Proof of Theorem 3.4.1 Since in one direction the assertion of the theorem is ob- 
vious, it suffices to verify that the equality F(x1,...,%n) = Fe, (x1)--: Fs, (%n) for 
the joint distribution function implies the independence of o (&1),..., 0 (&,). Put for 
simplicity n = 2 and denote by A and A the semi-intervals [x,, x2) and [y1, y2), 
respectively. The following equalities hold: 


P(E, € A, & € A) = P(E € [x1, x2), 2 € Ly, y2)) 
= F (x2, y2)F (x1, y2) — F(x2, yi) + F(X, yi) 
= (Fe, (x2) — Fe, 41) (Fe (92) — Fn (1) 
= P{&1 € A}P{& € A}. 


Consequently, if A;,i=1,...,n, and Aj, j =1,...,m, are two systems of 
disjoint semi-intervals, then 


r(s € LU 4i.& € U4] =) Pe € Aj, 2 € Aj) 
i=1 j=l ij 


=) P(E € Ai) P(& € Aj) 


ij 
m r(s elJ ai)(s eV i) (3.4.4) 
i=l j=l 


But the class of events {w:&(w) € A} = et (A), where A € A, forms, along with A, 
an algebra (we will denote it by a(&)), and one has o(a(&)) = o (€). In (3.4.4) 
we proved that a(&,) and a(&2) are independent. Therefore by Theorem 3.4.3 the 
o-algebras o (€;) = 0 (a@(&1)) and o (2) = o (a@(1)) are also independent. The the- 
orem is proved. 


It is convenient to state the following fact as a theorem. 


Theorem 3.4.5 Let y, and g be Borel functions and &, and & be independent 
random variables. Then n = ¢1 (&1) and n2 = ¢2(&2) are also independent random 
variables. 


Proof We have to verify that, for any Borel sets B, and Bo, 


P(y1 (1) € Bi, G2 (2) € Br) = P(gi (1) € Bi) P(¢2(E2) € Ba). (3.4.5) 


But the sets {x : g(x) € Bj} = g! (B;) = Be, i = 1,2, are again Borel sets. There- 
fore 


{eo : gi (Gi) € Bi} = {w: & € BF}, 


and the required multiplicativity of probability (3.4.5) follows from the indepen- 
dence of &;. The theorem is proved. 


3.4 Independence of Random Variables and Classes of Events 55 


Let {&, eal , be a sequence of independent random variables. Consider the random 
variables &, &&41,...,&m where k <m < o. Denote by o (&,..., &) (for m = 00 
we will write o (&, &41,...)) the o-algebra generated by the events Ohne Aj, 
where A; € 0 (&;). 


Definition 3.4.5 The o-algebra o (&, ... , &) is said to be generated by the random 
variables &,, ..., &n. 


In the sequel we will need the following proposition. 


Theorem 3.4.6 For any k > 1, the o-algebra o(&n4%) is independent of 
o(&1,..-,&n). 


Proof To prove the assertion, we make use of Theorem 3.4.3. To this end we have 
to verify that the algebra A generated by sets of the form B = ()/_, Ai, where 
Aj € o(&), is independent of o (€)4;). Let A € o (&)4%), then it follows from the 
independence of the o-algebras o (&1), o (&2),..., 0(&n), o (En+x) that 


P(AB) = P(A)P(A1)---P(A,) = P(A) - P(B). 


In a similar way we verify that 


(U aa) = (Uai}ru 
i=l i=l 


(one just has to represent |_}_, A; as a union of disjoint events from A). Thus the 
algebra A is independent of o (&)+;). Hence o(&,...,&,) and o(&+x) are inde- 
pendent. The theorem is proved. 


It is not hard to see that similar conclusions can be made about vector-valued 
random variables &1, &2, ... defining their independence using the relation 


P(E € Bi,...,& € Bn) =| [ PE; € By), 


where B; are Borel sets in spaces of respective dimensions. 

In condingion of this section note that one can always construct a probability 
space (82, §, P) ((R", 8”, Pz)) on which independent random variables &1, ..., 
with prescribed distribution functions F;, are given whenever these distributions 
F;, are known. This follows immediately from Sect. 3.3, since in our case the joint 
distribution function Fz (x1, ..., Xn) of the vector € = (&],..., &,) is uniquely deter- 
mined by the distribution fanchions Fg; (x) of the variables gE; 


n 
Fe(x1,...,4n) =] | Fe;(xy). 
1 


56 3 Random Variables and Distribution Functions 
3.5 * On Infinite Sequences of Random Variables 


We have already mentioned infinite sequences of random variables. Such sequences 
will repeatedly be objects of our studies below. However, there arises the question 
of whether one can define an infinite sequence on a probability space in such a way 
that its components possess certain prescribed properties (for instance, that they will 
be independent and identically distributed). 

As we saw, one can always define a finite sequence of independent random vari- 
ables by choosing for the “compound” random variable (&1,...,&,) the sample 
space IR; x R2 x --- x Ry = R” and o-algebra 8, x 8B, x --- x By = B” gener- 
ated by sets of the form B; x Bz x --- x B, C R", B; being Borel sets. It suffices 
to define probability on the algebra of these sets. In the infinite-dimensional case, 
however, the situation is more complicated. Theorem 3.2.1 and its extensions to the 
multivariate case are insufficient here. One should define probability on an algebra 
of events from R® = [ [7° , Rx so that its closure under countably many operations 
U and M form the o-algebra 8° generated by the products () Bj,, Bj, € Bj. 

Let N be a subset of integers. Denote by RY = Teen Re the direct product of 
the spaces Ry overk € N, BN Tren 38;. We say that distributions Py and Py” 
on (RY ‘BN ‘) and (RN "BN "), respectively, are consistent if the measures induced 
by Py’ and Py on the intersection RY = RY “OR” (here N = N’N N") coincide 
with each other. The measures on RW are said to be the projections of Py’ and Py, 
respectively, on R. An answer to the above question about the existence of an 
infinite sequence of random variables is given by the following theorem (the proof 
of which is given in Appendix 2). 


Theorem 3.5.1 (Kolmogorov) Specifying a family of consistent distributions Py 
on finite-dimensional spaces RN defines a unique probability measure Po. on 
(R®, 8°) such that each probability Py is the projection of Poo onto R™. 


It follows from this theorem, in particular, that one can always define on an appro- 
priate space an infinite sequence of arbitrary independent random variables. Indeed, 
direct products of measures given on Rj, Ro, ... for different products R“ “and RN” 
are always consistent. 


3.6 Integrals 


3.6.1 Integral with Respect to Measure 


As we have already noted, defining a probability space includes specifying a finite 
countably additive measure. This enables one to consider integrals with respect to 
the measure, 


/ g(&(@))P(da) (3.6.1) 


3.6 Integrals 57 


over the set 2 for a Borel function g and any random variable & on (2, ¥, P) (recall 
that g(x) is said to be Borel if, for any t, {x : g(x) < t} is a Borel set on the real 
line). 

The definition, construction and basic properties of the integral with respect to a 
measure are assumed to be familiar to the reader. If the reader feels his or her back- 
ground is insufficient in this aspect, we recommend Appendix 3 which contains all 
the necessary information. However, the reader could skip this material if he/she is 
willing to restrict him/herself to considering only discrete or absolutely continuous 
distributions for which integrals with respect to a measure become sums or conven- 
tional Riemann integrals. It would also be useful for the sequel to know the Stieltjes 
integral; see the comments in the next subsection. 

We already know that a random variable €(@) induces a measure F: on the real 
line which is specified by the equality 


F;([x, y)) =P <& < y) = Fe(y) — Fe (x). 


Using this measure, one can write the integral (3.6.1) as 


/ g(&(w))P(dw) = / g(x)F (dx). 


This is just the result of the substitution x = &(@). It can be proved simply by 
writing down the definitions of both integrals. The integral on the right hand side 
is called the Lebesgue-—Stieltjes integral of the function g(x) with respect to the 
measure P¢ and can also be written as 


[ewan (x). (3.6.2) 


3.6.2 The Stieltjes Integral 


The integral (3.6.2) is often just called the Stieltjes integral, or the Riemann-—Stieltjes 
integral which is defined in a somewhat different way and for a narrower class of 
functions. 
If g(x) is a continuous function, then the Lebesgue—Stieltjes integral coincides 
with the Riemann-Stieltjes integral which is equal by definition to 
N 
i g(x)dF(x)= tim lim Yh g@id[P Or) — Fw], 6.63) 
a—>—oo k=0 
where the limit on the right-hand side does not depend on the choice of parti- 
tions x9, x1,...,xy of the semi-intervals [a,b) and points x, € Ag = [xx, x41). 
Partitions x9,x,,...,xy are different for different N’s and have the property that 
max; (xz4.1 — x4) > Oas N > oo. 
Indeed, as we know (see Appendix 3), the Lebesgue-—Stieltjes integral is 


b 
[ sare = al dim, [ 8n (x) Fe (dx), (3.6.4) 


58 3 Random Variables and Distribution Functions 


where gy is any sequence of simple functions (assuming finitely many values) con- 
verging monotonically to g(x). We see from these definitions that it suffices to show 
that the integrals is with finite integration limits coincide. Since the Lebesgue— 
Stieltjes integral i g dF of acontinuous function g always exists, we could obtain 
its value by taking the sequence gy to be any of the two sequences of simple func- 
tions g, and gx which are constant on the semi-intervals A, and equal on them to 
Sy (xx) = sup g(x) and gy (xe) = inf g(x), 
xe Ag xe Ag 
respectively. Both sequences in (3.6.4) constructed from gx, and g%" will clearly 
converge monotonically from different sides to the same limit equal to the 
Lebesgue-Stieltjes integral 


b 
i g(x)dF(x). 


a 


But for any Xx, € Ax, one has 


gy (Xk) S (Xx) < gyn), 
and therefore the integral sum in (3.6.3) will be between the bounds 


b N b 
/ gy AF (x) < S> (XO) [F Oe+1) — F(xx)] < : gy dF (x). 
a k=0 a 
These inequalities prove the required assertion about the coincidence of the inte- 
grals. 
It is not hard to verify that (3.6.3) and (3.6.4) will also coincide when F(x) is 
continuous and g(x) is a function of bounded variation. In that case, 


b b 
: g(x) dF(x) = g(x) F(a) |? -{ F(x) dg (x). 
a a 
Making use of this fact, we can extend the definition of the Riemann-Stieltjes in- 
tegral to the case when g(x) is a function of bounded variation and F(x) is an 
arbitrary distribution function. Indeed, let F(x) = F.(x) + Fa(x) be a representa- 
tion of F(x) as a sum of its continuous and discrete components, and y;, y2,... be 
the jump points of Fy (x): 


Pk = Fa(yk + 9) — Fa(ye) > 0. 
Then one has to put by definition 


/ g(x) dF (x) = >> peg (ye) + i g(x) dFe(x), 


where the Riemann-Stieltjes integral { gdF-(x) can be understood, as we have 
already noted, in the sense of definition (3.6.3). 

We will say, as is generally accepted, that { gdF exists if the integral [ |g|dF 
is finite. It is easy to see from the definition of the Stieltjes integral that, for step 


3.6 Integrals 59 


functions F(x) (the distribution is discrete), the integral becomes the sum 


/ g(x) d F(x) = J g(an)(F (xe +0) — Fx) = Yea PE = x0), 
k 


k 
where x, x2,... are jump points of F(x). If 


ras [ p(x) dx 


is absolutely continuous and p(x) and g(x) are Riemann integrable, then the Stielt- 
jes integral 


[eare= [spas 


becomes a conventional Riemann integral. 

We again note that for a reader who is not familiar with Stieltjes integral tech- 
niques and integration with respect to measures, it is possible to continue reading 
the book keeping in mind only the last two interpretations of the integral. This would 
be quite sufficient for an understanding of the exposition. Moreover, most of the 
distributions which are important from the practical point of view are just of one of 
these types: either discrete or absolutely continuous. 

We recall some other properties of the Stieltjes integral (following immediately 
from definitions (3.6.4) or (3.6.3) and (3.6.5)): 


b 
/ dF = F(b) — F(a); 
a 
b c b 
/ edF= f edF +f gdF if g or F is continuous at the point c; 
a a c 


fortenrar=faar+ [ mar: 


fecar=cf gar for c = const; 


b b 
[ car=ert-[ Fdg 
a a 


if g is a function of bounded variation. 


3.6.3 Integrals of Multivariate Random Variables. 
The Distribution of the Sum of Independent 
Random Variables 


Integrals with respect to measure (3.6.1) make sense for multivariate variables 
E(w) = (&1(@),...,&:(@)) as well (one cannot say the same about Riemann— 


60 3 Random Variables and Distribution Functions 
Stieltjes integrals (3.6.3)). We mean here the integral 


[ g(&1(@),...,§n(@))P(do), (3.6.5) 


where g is a measurable function mapping R” into R, so that g(&|(@), ..., &n(@)) 
is a measurable mapping of £2 into R. 

If (R", 8”, Fs) is a sample probability space for €, then the integral (3.6.5) can 
be written as 


i g(x)Fz(dx), x= (x1,...,%) ER”. 
R" 


Now turn to the case when the components 1, ..., &, of the vector € are independent 
and assume first that n = 2. For sets 
B= By x By ={(x1, x2): x1 € Bi, x2 € By} CR’, 
where B,; and B are measurable subsets of R, one has the equality 
P(E ¢ B) = P(E € By, & € Bo) = P(E € Bi) P(é2 € Bp). (3.6.6) 


In that case one says that the measure Fs, ¢,(dx1,dx2) = P(&, € dx, &2 € dx2) 
on R?, corresponding to (&,, &2), is a direct product of the measures 


Fz, (dx) => PE, € dx}) and Fz, (dx2) => P(é. € dx2). 


As we already know, equality (3.6.6) uniquely specifies a measure on (R?, 87) 
from the given distributions of &; and & on (R, 8). It turns out that the integral 


; g(r, x2) Pee, (dr, dx2) (3.6.7) 


with respect to the measure F:, <, can be expressed in terms of integrals with respect 
to the measures F;, and F;,. Namely, Fubini’s theorem holds true (for the proof see 
Appendix 3 or property 5A in Sect. 4.8). 


Theorem 3.6.1 (Theorem on iterated integration) For a Borel function g(x, y) >0 
and independent &, and &, 


[ ecsi.20F ae (an1.ax= f] f g(a. x2)Fi (ds) [Fa (dx}). (3.6.8) 


If g(x, y) can assume values of different signs, then the existence of the integral 
on the left-hand side of (3.6.8) is required for the equality (3.6.8). The order of 
integration on the right-hand side of (3.6.8) may be changed. 


It is shown in Appendix 3 that the measurability of g(x, y) implies that of the 
integrands on the right-hand side of (3.6.8). 


Corollary 3.6.1 Let g(x1, x2) = g1(*1)g2(%2). Then, if at least one of the following 
three conditions is met: 


(1) g1 = 0, 92 = 0, 


3.6 Integrals 61 


(2) f gi@vi)g2(x2) Fe, (dx1, dx) exists, 
(3) Jf gj @pFe,(dx;), j= 1, 2, exist, 


then 
i gi 011) 92002) Fee) (d.t1, dx2) = / g1(n)Fe, (dx) i g(02)Fe,(dx2). B.6.9) 
To avoid trivial complications, we assume that P(g; (§;) =0) £1, j = 1,2. 


Proof Under any of the first two conditions, the assertion of the corollary follows 
immediately from Fubini’s theorem. For arbitrary g1, g2, put gj = a = Ba 3 8; > 0, 


712i f gj Fs < oo (we will use here the abridged notation for integrals), 
then 


[ swore dF ;, = | stefan, dF ¢, - | ste; dF ;, dF, 


= / 8) 85 dB es, dFz, + } 81 85 d¥«, dF: 


= [stars fefare—f stare f ef aks 
-f & dF¢, [etarat f scars [93 dF :, 
=f dF: [ eats. 


Corollary 3.6.2 In the special case when g(x, x2) = Ip(x1, x2) is the indicator 
of a set B € 8, we obtain the formula for sequential computation of the measure 
of B: 


P((&1, &2) € B) = [ P(e) € B)F¢, (dx}). 


The probability of the event {(x1, 2) € B} could also be written as P(&2 € By,) = 
P:, (By,) where By, = {x2 : (x1, x2) € B} is the “section” of the set B at the point x1. 
If B= {(x1, x2): x1 + x2 < x}, we get 


P((é1, 62) € B) = PGi +h <x) = Fiy+e@) 
= [ Por +e <0Fy dx) 
= [ Femara). (3.6.10) 
We have obtained a formula for the distribution function of the sum of independent 


random variables expressing F:,+¢, in terms of Fz, and F¢,. The integral on the 
right-hand side of (3.6.10) is called the convolution of the distribution functions 


62 3 Random Variables and Distribution Functions 


Fs, (x) and F:,(x) and is denoted by F¢, * Fs, (x). In the same way one can obtain 
the equality 


P+ <x)= i Fi, (@ —1) dF, (0). 


Observe that the right-hand side here could also be considered as a result of inte- 
grating 


[are (t) Fe, (x —t) 


by parts. 

If at least one of the distribution functions has a density, the convolution also 
has a density. This follows immediately from the formulas for convolution. Let, for 
instance, 


Fe, (x) =| fe (u) du. 


Then 
Fant) = [ Fa dn) [ Se,(u —t)du 


=| ¢) Fa (dt) fu) du, 


so that the density of the sum &; + & equals 


fire 00) = / Fe, (dt) f(x — 1) = / fey(x — 1) Fe, (0). 


Example 3.6.1 Let &,&,... be independent random variables uniformly dis- 
tributed over [0, 1], i.e. &), &o,... have the same distribution function with density 

FQ) to xe (0, 1], (3.6.11) 

x)= 6. 
0, x €[0, 1]. 
Then the density of the sum &; + & is 
0, x ¢ [0,2], 
fej+e (X) =i f(x —tdt=} x, x € [0, 1], (3.6.12) 
0 


2—x, xeé[l,2]. 
The integral present here is clearly the length of the intersection of the segments 
[0, 1] and [x — 1, x]. The graph of the density of the sum &; + &2 + &3 will consist 
of three pieces of parabolas: 


0, x ¢ [0, 3], 
; 2, x € 10,1) 
Fe tetés(*) [ fate —9) (oOo ate. palin 


cay x € [2,3]. 


3.6 Integrals 63 


Fig. 3.2 Illustration to Example 3.6.1. The upper row visualizes the computation of the convolu- 
tion integral for the density of & + & + &. The lower row displays the densities of &, &| + &, 
and & + & + &3, respectively 


The computation of this integral is visualised in Fig. 3.2, where the shaded areas 
correspond to the values of fz,+¢,+¢,() for different x. The shape of the densities 
of |, &| + & and & + & + & is shown in Fig. 3.2b. The graph of the density of the 
sum & + & + &3 + &4 will consist of four pieces of cubic parabolas and so on. If 
we shift the origin to the point n/2, then, as n increases, the shape (up to a scaling 
transformation) of the density of the sum €; +---+ &, will be closer and closer to 
that of the function e~*”. We will see below that this is not due to chance. 

In connection with this example we could note that if € and n are two independent 
random variables, € having the distribution function F(x) and 7 being uniformly 
distributed over [0, 1], then the density of the sum € + 7 at the point x is equal to 


fen) = f aFOfls—0= f AB) = F(a) ~ Fe=D). 


Chapter 4 
Numerical Characteristics of Random Variables 


Abstract This chapter opens with Sect. 4.1 introducing the concept of the expec- 
tation of random variable as the respective Lebesgue integral and deriving its key 
properties, illustrated by a number of examples. Then the concepts of conditional 
distribution functions and conditional expectations given an event are presented and 
discussed in detail in Sect. 4.2, one of the illustrations introducing the ruin problem 
for the simple random walk. In the Sects. 4.3 and 4.4, expectations of independent 
random variables and those of sums of random numbers of random variables are 
considered. In Sect. 4.5, Kolmogorov—Prokhorov’s theorem is proved for the case 
when the number of random terms in the sum is independent of the future, fol- 
lowed by the derivation of Wald’s identity. After that, moments of higher orders 
are introduced and discussed, starting with the variance in Sect. 4.5 and proceeding 
to covariance and correlation coefficient and their key properties in Sect. 4.6. Sec- 
tion 4.7 is devoted to the fundamental moment inequalities: Cauchy—Bunjakovsky’s 
inequality (a.k.a. Cauchy—Schwarz inequality), Hdlder’s and Jensen’s inequalities, 
followed by inequalities for probabilities (Markov’s and Chebyshev’s inequalities). 
Section 4.8 extends the concept of conditional expectation (given a random variable 
or sigma-algebra), starting with the discrete case, then turning to square-integrable 
random variables and using projections, and finally considering the general case 
basing on the Radon—Nykodim theorem (proved in Appendix 3). The properties of 
the conditional expectation are studied, following by introducing the concept of con- 
ditional distribution given a random variable and illustrating it by several examples 
in Sect. 4.9. 


4.1 Expectation 


Definition 4.1.1 The (mathematical) expectation, or mean value, of a random vari- 
able € given on a probability space (2, §, P) is defined as the quantity 


Eé = / £(w)P(da). 
2 


Let €~ = max(0,+é&). The values EE~ > 0 are always well defined (see Ap- 
pendix 3). We will say that Eé exists if max(E&*+, EE~) <0. 


A.A. Borovkov, Probability Theory, Universitext, 65 
DOI 10.1007/978-1-4471-5201-9_4, © Springer-Verlag London 2013 


66 4 Numerical Characteristics of Random Variables 


We will say that Eé is well defined if min(KE*+, E&~) < oo. In this case the 
difference EE* — Eé~ is always well defined, but EE = E+ — E&~ may be too. 


By virtue of the above remarks (see Sect. 3.6) one can also define Eé as 


Eé = [area = f rar, (4.1.1) 


where F(x) is the distribution function of &. It follows from the definition that Eé 
exists if E|E| < oo. It is not hard to see that Eé does not exist if, for instance, 
1 — F(x) > 1/x for all sufficiently large x. 

We already know that if F(x) is a step function then the Stieltjes integral (4.1.1) 
becomes the sum 


E>) PE =p, 
k 
If F(x) has a density f(x), then 


Eé = f xfenax, 


so that Eé is the point of the “centre of gravity” of the distribution F of the unit 
mass on the real line and corresponds to the natural interpretation of the mean value 
of the distribution. 

If g(x) is a Borel function, then 7 = g(&) is again a random variable and 


Eg(é) = i g(&(w))P(dw) = / g(x) d F(x) = / xd Fee (2). 


The last equality follows from definition (4.1.1). 
The basic properties of expectations coincide with those of the integral: 


El. Ifa and b are constants, then E(a + bé) =a-+ DEE. 

E2. E(é + 2) = E(é,) + E(é2), if any two of the expectations appearing in the 
formula exist. 

E3. Ifa<é& <b, thena < Eé <b. The inequality Eé < E|é| always holds. 

E4. If& > 0 and Ké = 0, then — = 0 with probability 1. 

E5. The probability of an event A can be expressed in terms of expectations as 


P(A) = EI(A), 


where I(A) is the random variable equal to the indicator of the event A: 
I(A) = 1 ifw € A and (A) = 0 otherwise. 


For further properties of expectations, see Appendix 3. 
We consider several examples. 


Example 4.1.1 Expectations related to the Bernoulli scheme. Let § € Bp, i.e. & 
assumes two values: 0 with probability g and 1 with probability p, where p+q = 1. 
Then 


EE =0 x P(é =0)+1xP(E=1) =p. 


4.1 Expectation 67 


Now consider a sequence of trials in the Bernoulli scheme until the time of 
the first “success”. In other words, consider a sequence of independent variables 
&,, &o,... distributed as € until the time 


n:=min{k >1:& = 1}. 
It is evident that 7 is a random variable, 
P=kh=q*'p, k>1, 


so that 7 — 1 has the geometric distribution. Consequently, 


i= Dota 'p : 


If we put S, := x1 &, then clearly ES, = np. Now define, for an integer N > 1, 
the random variable n = min{k > 1 : S; = N} as the “first passage time” of level N 
by the sequence S,. One has 


P(y =k) = PCSk-1 = N — Dp, 


or) N or) 
En =p ok ©") p¥—tgt-¥ =P ke NE DG. 
— "MN 1 (WD 


The sum here is equal to the N-th derivative of the function w(z) = i. zk = 


1/(1 — z) at the point z = q, ice. it equals N!/p%*!. Thus En = N/p. As we will 
see below, this equality could be obtained as an obvious consequence of the results 
of Sect. 4.4. 


1 


- ape Dp 


Example 4.1.2 If§ € ®, ,2 then 


1 _ (tea)? 
ey Jae 207 dt 


ov 20 


(ta)? (t-a)? 
(t—a)e~ ae dt+ "262° dt 
“NR / oJ 20 


ze aS dz+a=a. 


=e 


Thus the parameter a of the normal law is equal to the expectation of the latter. 


Example 4.1.3 If § € 11, then E€ = yw. Indeed, 


aa. =. git 
= MoH eH _ 
a a la easy iia 
k=0 k=1 


Example 4.1.4 If € € Uo.1, then 


; 1 
Eé = dx=-. 
€ 3 x 5 


68 4 Numerical Characteristics of Random Variables 


It follows from property E1 that, for § @ U,.p, one has 


b- b 
a 


2 2 
If € © Ko; then the expectation E& does not exist. That follows from the fact 


that the integral [ ames diverges. 


Example 4.1.5 We now consider an example that is, in a sense, close to Exam- 
ple 4.1.1 on the computation of E7, but which is more complex and corresponds 
to computing the mean value of the duration of a chess tournament in a real-life 
situation. In Sect. 3.4 we described a simple probabilistic model of a chess tourna- 
ment. The first player wins in a given game, independently of the outcomes of the 
previous games, with probability p, loses with probability g, p + q < 1, and makes 
a tie with probability 1 — p — qg. Of course, this is a rather rough first approximation 
since in a real-life tournament there is apparently no independence. On the other 
hand, it is rather unlikely that, for balanced high level players, the above probabili- 
ties would substantially vary from game to game or depend on the outcomes of their 
previous results. A more complex model incorporating dependence of p and q of 
the outcomes of previous games will be considered in Example 13.4.2. 

Assume that the tournament continues until one of the two participants wins N 
games (then this player will be declared the winner). For instance, the 1984 individ- 
ual World Championship match between A. Karpov and G. Kasparov was organised 
just according to this scheme with N = 6. What can one say about the expectation 
En of the duration 7 of the tournament? 

As was shown in Example 3.3.1, 


N-1 N-1 
P(n=n)=p >) pm—-1;N-1,i)+q )) pa—-1i,N-D, 
i=0 i=0 
where 
a n! 5 a ae 
p(n; i, J) = => —— Pai - p- gy". 
iljin—i- fj)! 


Therefore, under obvious conventions on the summation indices, 
N-1 Nn ji i.n N-1 
1 
En= pp a eee: AS Tees 
* i=0 : n=0 ; 
Kite N SH pay), 
The sum over n was calculated in Example 4.1.1 to be (N +i)!/(p + q)NTit!. 
Consequently, 


N-1 ‘ s ; 
N y (pNqi + piq®)(N +i)! 
i!N\(p + q)it® 


En = 

' an i 

N-1 J 

ai CP Yieta-ni+ra-9%), 
l 


7 | n=0 
where r = p/(p+q). 


4.1 Expectation 69 


In his interview of 3 March 1985 to the newspaper “Izvestija”, Karpov said 
that in qualifying tournaments he would lose, on average, 1 game out of 20, and 
that Kasparov’s results were similar. If we put in our simple model p = g = 1/20 
(strictly speaking, one cannot make such a conclusion from the given data; the rela- 
tion p = q = 1/20 should be considered rather as one of many possible hypotheses) 
then, for VN = 6, direct calculations show that (r = 1/2) 


Be leone 2) | Aes 
c=% ete | 


Thus, provided that our simplest model is adequate, the expected duration of 
a tournament turns out to be very large. The fact that the match between Karpov 
and Kasparov was interrupted by the decision of the chairman of the World Chess 
Federation after 48 games because the match had dragged on, might serve as a 
confirmation of the correctness of the assumptions we made. 

Taking into account the results of the match and consequent games between Kar- 
pov and Kasparov could lead to estimates (approximate values) for the quantities p 
and qg that would differ from 1/20. 

For our model, one also has the following simple inequality: 


2N —-1 
—— <En< ; 
p+q Pp+q 
It follows from the relation ny <7 < n2n~-1, where ny is the number of games until 
the time when the total of the points gained by both players reaches N. By virtue of 
Example 4.1.1, Eny = N/(p+q). 


Example 4.1.6 In the problem on cells in Sects. 1.3 and 1.4, we considered the 
probability that at least one of the cells in which r particles are placed at random 
is empty. Find the expectation of the number S,- of empty cells after r particles 
have been placed. If A, denotes the event that the k-th cell is empty and I(A;) is the 
indicator of this event then 


n n 1\" 
Snr = SAR). ESh,r = YS P(Ag) = n(1 = ~) ; 
I 1 


Note now that ES, is close to 0 if (1 — 1/n)” is small compared with 1/n, i.e. 
when —r In(1 — 1/n) — Inn is large. For large n, 


1 1 1 
n n n 


and the required relation will hold if (r — nInn)/n is large. In our case (cf. prop- 
erty E4), the smallness of ES, , will clearly imply that of P(A) = P(S,- > 0). 


70 4 Numerical Characteristics of Random Variables 


4.2 Conditional Distribution Functions and Conditional 
Expectations 


Let (2, §, P) be a probability space and B € § be an event with P(B) > 0. Form a 
new probability space (2, §, Pg), where Pz is defined for A € § by the equality 
P3(A):=P(A|B). 


It is easy to verify that the probability properties P1, P2 and P3 hold for Pg. Let 
€ be a random variable on (2, ¥, P). It is clearly a random variable on the space 
(2, 8, Pg) as well. 


Definition 4.2.1 The expectation of & in the space (92, §, Pg) is called the condi- 
tional expectation of € given B and is denoted by E(é|B): 


BE|B)= | Ew)Pelde. 
9) 
By the definition of the measure Pz, 


BE|B)= | sw)Pdo|B) = 5 [ sordon = 55 [ sorte, 


P(B) P(B) 


The last integral differs from Eé in that the integration in it is carried over the set B 
only. We will denote this integral by 


BE: B= | Pde 
so that 
1 
E(é|B) = Pye B). 
It is not hard to see that the function 
F(x|B):=Pp(& <x) =P(E < x|B) 


is the distribution function of the random variable € on (2, ¥, Pz). 


Definition 4.2.2 The function F (x|B) is called the conditional distribution function 
of & (in the “conditional” space (2, §, Pg)) given B. 


The quantity E(€|B) can evidently be rewritten as 


[rarcia, 


If the o-algebra o () generated by the random variable € does not depend on the 
event B, then Pg(A) = P(A) for any A € o (&). Therefore, in that case 


F(x|B) = F(x), E(é|B) =Ké, E(é; B)=P(B)EE. (4.2.1) 


4.2 Conditional Distribution Functions and Conditional Expectations 71 


Let {B,} be a (possibly finite) sequence of disjoint events such that |) B, = @ and 
P(B,,) > 0 for any n. Then 


Bs= | saw) => [ &(w)P(dow) 
= )CEE: Bn) = ) | P(Bn)E(E|Bn). (4.2.2) 


We have obtained the total probability formula for expectations. This formula can 
be rather useful. 


Example 4.2.1 Let the lifetime of a device be a random variable € with a distribution 
function F(x). We know that the device has already worked for a units of time. 
What is the distribution of the residual lifetime? What is the expectation of the 
latter? 

Clearly, in this problem we have to find P(é —a > x|& > a) and E(§ —alé > a). 
Of course, it is assumed that 


P(a):=P( >a)>0. 
By the above formulae, 
P(x+a) 
P(a) 


It is interesting to note the following. In many applied problems, especially when 
one deals with the operation of complex devices consisting of a large number of 
reliable parts, the distribution of € can be assumed to be exponential: 


P(x)=P(E>x)=e", pw>O. 


P —a>xlf>a)= BE-ae=a=5— [ ree 


(The reason for this will become clear later, when considering the Poisson theorem 
and Poisson process. Computers could serve as examples of such devices.) But, for 
the exponential distribution, it turns out that the residual lifetime distribution 


P(x +a) ee 
pay HX — P(x) (4.2.3) 


coincides with the lifetime distribution of a new device. In other words, a new de- 
vice and a device which has already worked without malfunction for some time 
a are, from the viewpoint of their future failure-free operation time distributions, 
equivalent. 

It is not hard to understand that the exponential distribution (along with its dis- 
crete analogue P(é = k) = q*(1 — q), k =0,1,...) is the only distribution pos- 
sessing the above remarkable property. One can see that, from equality (4.2.3), we 
necessarily have 


PE —a>x|§ >a)= 


P(x +a) = P(x)P(a). 


Example 4.2.2 Assume that n machines are positioned so that the distance between 
the i-th and j-th machines is a;,;, 1 <i, j <n. Each machine requires service from 


72 4 Numerical Characteristics of Random Variables 


time to time (tuning, repair, etc.). Assume that the service is to be done by a single 
worker and that the probability that a given new call for service comes from the 
j-th machine is pj; Oe 1 Pj = 1). If, for instance, the worker has just completed 
servicing the i-th machine, then with probability p; (not depending on 7) the next 
machine to be served will be the j-th machine; the worker will then need to go to it 
and cover a distance of a;; units. What is the mean length of such a passage? 

Let B; denote the event that the i-th machine was serviced immediately before a 
given passage. Then P(B;) = p;, and the probability that the worker will move from 
the i-th machine to the j-th machine, j = 1,...,n, is equal to p;. The length € of 
the passage is a;;. Hence 


E(E|Bi) =) pjai,j. 


j=l 


and by the total probability formula 


BE =) | P(B)EE|Bi) = D7 Pj Piaij. 


r=! (j=) 


The obtained expression enables one to compare different variants of positioning 
machines from the point of view of minimisation of the quantity E€ under given 
restrictions on a;;. For instance, if a;; = 1 and all the machines are of the same type 
(p; = 1/n) then, provided they are positioned along a straight line (with unit steps 
between them), one gets a;; = |j — i| and! 


i = i= n-1 1 
FE=— D> lj-il= 5 lk -b= (1+ 
n jal ne 3 n 


so that, for large n, the value of Eé is close to n/3. Thus, if there are s calls a day 
then the average total distance covered daily by the worker is approximately sn/3. 
It is easy to show that positioning machines around a circle would be better but still 
not optimal. 


Example 4.2.3 As was already noticed, not all random variables (distributions) have 
expectations. The respective examples are by no means pathological: for instance, 
the Cauchy distribution Ky « has this property. Now we will consider a problem on 
random walks in which there also arise random variables having no expectations. 
This is the problem on the so-called fair game. Two gamblers take part in the game. 
The initial capital of the first gambler is z units. This gambler wins or loses each 


'To compute the sum, it suffices to note that 


n—-1 
Yok N=50 2)(n—1)n 
k=1 


(compare the initial values and increments of the both sides). 


4.2 Conditional Distribution Functions and Conditional Expectations 73 


play of the game with probability 1/2 independently of the outcomes of the previous 
plays, his capital increasing or decreasing by one unit, respectively. Let z + S; be 
the capital of the first gambler after the k-th play, n(z) is the number of steps until 
his ruin in the game versus an infinitely rich adversary, i.e. 


n(z) =min{k: z+ S,=0}, (0)=0. 


If inf, Sy > —z (i.e. the first gambler is never ruined), we put n(z) = ow. 

First we show that n(z) is a proper random variable, i.e. a random variable 
assuming finite values with probability |. For the first gambler, this will mean that 
he goes bankrupt with probability 1 whatever his initial capital z is. Here one could 
take £2 to be the “sample” space consisting of all possible sequences made up of 
1 and —1. Each such sequence w would describe a “trajectory” of the game. (For 
example, —1 in the k-th place means that the first gambler lost the k-th play.) We 
leave it to the reader as an exercise to complete the construction of the probability 
space (2, §, P). Clearly, one has to do this so that the probability of any first n 
outcomes of the game (the first n components of w are fixed) is equal to 2~”. 

Put 


u(z):=P(n(z)< 00), u(0):=1, 


and denote by B, the event that the first component of w is 1 (the gambler won in 
the first play) and Bz that this component is —1 (the gambler lost). Noticing that 
P(n(z) < &|B,) =u(z+ 1) Gf the first play is won, the capital becomes z+ 1), we 
obtain by the total probability formula that, for z > 1, 


u(z) = P(B1)P(n(z) < 00|B1) + P(B2)P(n(z) < 00|B2) 
1 1 
= zu +1)+ que — 1). 
Putting 6(z) := u(z + 1) — u(z), z = 0, we conclude from here that 6(z) — 
6(z — 1) =0, and hence 6(z) = 6 = const. Since 


u(z + 1) =u) + >) 5(k) =u(0) + 26, 
k=1 
it is evident that 6 can be nothing but 0, so that u(z) = 1 for all z. 

Thus, in a game against an infinitely rich adversary, the gambler will be ruined 
with probability 1. This explains, to some extent, the fact that all reckless gamblers 
(not stopping “at the right time”; choosing this “right time” is a separate problem) 
go bankrupt sooner or later. Even if the game is fair. 

We show now that although n(z) is a proper random variable, En(z) = oo. As- 
sume the contrary: 


v(Z) := En(z) < w. 


Similarly to the previous argument, we notice that E(y(z)|B1) = 1+ v(z + 1) (the 
capital became z + 1, one play has already been played). Therefore by the total 
probability formula we find for z > 1 that 


1 1 
v(z) = 5 (1 + v(z+1)) + ail +v(z—1)), v0) =0. 


74 4 Numerical Characteristics of Random Variables 


It can be seen from this formula that if v(z) < oo, then v(k) < oo for all k. Set 
A(z) = v(z + 1) — v(z). Then the last equality can be written down for z > 1 as 


ee rh 1 
=a (z) 5 (z—1), 


or 
A(z) = A(z — 1) —2. 
From this equality we find that A(z) = A(O) — 2z. Therefore 
z—-l 


vz) = > A(k) =2A(0) — zz — 1) = zv(1) —2@—- I. 
k=0 
It follows that En(z) < 0 for sufficiently large z. But 7(z) is a positive random 
variable and hence E7(z) > 0. The contradiction shows that the assumption on the 
finiteness of the expectation of n(z) is wrong. 


4.3 Expectations of Functions of Independent Random Variables 


Theorem 4.3.1 


1. Let € and n be independent random variables and g(x, y) be a Borel function. 
Then if g => 0 or Eg(&, n) is finite, then 
Eg(&,n) =E[Eg(x, n)|x=e J. (4.3.1) 
2. Let g(x,y) = gi(x)go(y). If gi(€) 2 0 and go(n) = 0, or both Eg\(€) and 
Eg2(7) exist, then 


Eg(&, 7) = Eg (€)Ego(n). (4.3.2) 


The expectation Eg(&,) exists if and only if both Eg;(&) and Egz(n) exist. (We 
exclude here the trivial cases P(g\(€) = 0) = 1 and P(g2(n) = 0) = 1 to avoid 
trivial complications.) 


Proof The first assertion of the theorem is a paraphrase of Fubini’s theorem in terms 
of expectations. The first part of the second assertion follows immediately from 
Corollary 3.6.1 of Fubini’s theorem. Since |g)(&)| => 0 and |g2(7)| => 0 and these 
random variables are independent, one has 


E|g€)g2()| = E|g1€)|E|g2()|- 
Now the last assertion of the theorem follows immediately, for one clearly has 


Elgi(§)| £0, Elgo(m)| 40. 


Remark 4.3.1 Formula (4.3.1) could be considered as the total probability formula 
for computing the expectation Eg(&, 7). Assertion (4.3.2) could be written down 
without loss of generality in the form 


Eén = EEEn. (4.3.3) 


4.4 Expectations of Sums of a Random Number of Random Variables 75 


To get (4.3.2) from this, one has to take g;(&) instead of € and g2(7) instead of 
n—these will again be independent random variables. 

Examples of the use of Theorem 4.3.1 were given in Sect. 3.6 and will be appear- 
ing in the sequel. 

The converse to (4.3.2) or (4.3.3) does not hold. There exist dependent random 
variables € and 7 such that 


Eén = EéEn. 


Let, for instance, ¢ and € be independent and EE = Eg = 0. Put» = €¢. Then & and 
n are clearly dependent (excluding some trivial cases when, say, € = const), but 


Eén = Eé?¢ = Bé?E¢ = 0 = EEEn. 


4.4 Expectations of Sums of a Random Number of Random 
Variables 


Assume that a sequence {En} of independent random variables (or random vec- 


tors) and an integer-valued random variable v > 0 are given on a probability space 
(2,8, P). 

Property E2 of expectations implies that, for sums S, = )~/_, &, the following 
equality holds: 


n 
ES, =) Eé;. 
i=l 


In particular, if a, = E&, =a do not depend on k then ES, = an. 

What can be said about the expectation of the sum s,, of the random number v 
of random variables &,, 2, ...? To answer this question we need to introduce some 
new notions. 

Let $k.n := 0 (&,...,&) be the o-algebra generated by the n — k + 1 random 
variables &,..., &). 


Definition 4.4.1 A random variable v is said to be independent of the future if the 
event {v <n} does not depend on ¥n+1,00- 


Let, further, a family of embedded o-algebras §n : ¥n C Fn-+1 be given, such that 
Sin =o (&,...,€1) C n- 


Definition 4.4.2 A random variable v is said to be a Markov (or stopping) time with 
respect to the family {¥,}, if {v <n} € Fn. 


Often §, is taken to be §1.n =o (&1, ..., &,). We will call a stopping time with re- 
spect to §1,n simply a stopping (or Markov) time without indicating the correspond- 
ing family of o-algebras. In this case, knowing the values of &;,..., &, enables us 
to say whether the event {v <n} has occurred or not. 

If the &, are independent (the o-algebras §1,, and %n+1,00 are independent) then 
the requirement of independence of the future is wider than the Markov property, 


76 4 Numerical Characteristics of Random Variables 


because if v is a stopping time with respect to {%1,,} then, evidently, the random 
variable v does not depend on the future. 

As for a converse statement, one can only assert the following. If v does not 
depend on the future and the &; are independent then one can construct a family of 
embedded o-algebras {§n}, %n D Fi.n, such that v is a stopping time with respect 
to Fn ({v <n} C Fy) and F,, does not depend on Fy+1,00. AS Fn, we can take the o- 
algebra generated by §1,, and the events {v = k} for k <n. For instance, a random 
variable v independent of {&;} clearly does not depend on the future, but is not a 
stopping time. Such v will be a stopping time only with respect to the family {¥n} 
constructed above. 

It should be noted that, formally, any random variable can be made a stopping 
time using the above construction (but, generally speaking, there will be no inde- 
pendence of §, and §n,,00). However, such a construction is unsubstantial and not 
particularly useful. In all the examples below the variables v not depending on the 
future are stopping times defined in a rather natural way. 


Example 4.4.1 Let v be the number of the first random variable in the sequence 
{En}o which is greater than or equal to N, i.e. v =inf{k : && > N}. Clearly, v isa 
stopping time, since 


{v<n}=(J{&>N}e Sin. 


k=1 


If & are independent, then evidently v is independent of the future. 
The same can be said about the random variable 


k 
n(t):=min{k: S.>N},  Se=>°&;. 
j=1 


Note that the random variables v and n(t) may be improper (e.g., 7(t) is not defined 
on the event {S := sup S; < N}). The random variable 6 := min{k : Sy = S} is nota 
stopping time and depends on the future. 

The term “Markov” random variable (or Markov time) will become clearer after 
introducing the notion of Markovity in Chap. 13. The term “stopping time” is related 
to the nature of a large number of applied problems in which such random variables 
arise. As a typical example, the following procedure could be considered. Let &% be 
the number of defective items in the k-th lot produced by a factory. Statistical quality 
control is carried out as follows. The whole production is rejected if, in sequential 
testing of the lots, it turns out that, for some n, the value of the sum 


n 
Sn = yy EK 
k=1 


exceeds a given admissible level a + bn. The lot number v for which this happens, 


v:=min{n: S, >a+bn}, 


4.4 Expectations of Sums of a Random Number of Random Variables 77 


is a stopping time for the whole testing procedure. To avoid a lengthy testing, one 
also introduces a (literal) stopping time 


v* := min{n: S, < —A+bn}, 


where A > 0 is chosen so large as to guarantee, with a high probability, a sufficient 
quality level for the whole production (assuming, say, that the &; are identically dis- 
tributed). It is clear that v and v* both satisfy the definition of a Markov or stopping 
time. 


Consider the sum S, = €; +---+ &, of a random number of random variables. 
This sum is also called a stopped sum in the case when v is a stopping time. 


Theorem 4.4.1 (Kolmogorov—Prokhorov) Let an integer-valued random variable v 
be independent of the future. If 


Y > P(v = KEE < 00 (4.4.1) 
k=1 
then 
ES, = 9) P(v > KER. (4.4.2) 
k=1 


If & > 0 then condition (4.4.1) is superfluous. 


Proof The summand &; is present in the sum S, if and only if the event {v > k} 
occurs. Thus the following representation holds for the sum S,: 


Co 
Sv = 0 Ev = &), 
k=1 
where I(B) is the indicator of the event B. Put S,, := yi &I(v > k). If & Sn 
then S,, ¢ S, for each w as n — oo, and hence, by the monotone convergence 
theorem (see Theorem A3.3.1 in Appendix 3), 


n 
ES, = lim ES,, = lim SEI >k). 
n—->Co no kel 
But the event {v > k} complements the event {v < k — 1} and therefore does not 
depend on o (&, && 41, ...) and, in particular, on o (&). Hence, putting a, := E&, we 
get EE, I(v > k) =a, P(v = k), and 


n [o,@) 
ES, = lim, S > axP(vx >kh= ~ axP(v > k). (4.4.3) 
k=1 k=1 
This proves (4.4.2) for & > 0. 
Now assume &; can take values of both signs. Put 


n n 
Bsl&l, af = ES, Zn, Zu = Deh). 
k=1 


k=1 


78 4 Numerical Characteristics of Random Variables 


Applying (4.4.3), we obtain by virtue of (4.4.1) that 
oe) 
EZ, =) agP(v =k) < 00. 
k=1 


Since [Syn] < Zyn < Zy, by the monotone convergence theorem (see Corol- 


lary 6.1.3 or (the Fatou—Lebesgue) Theorem A3.3.2 in Appendix 3) we have 
ESy = lim ESyn =) axP(v = 4), 


where the series on the right-hand side absolutely converges by virtue of (4.4.1). 
The theorem is proved. 


Put 
a™ := max ay, dy := mina, 

where, as above, az = Eé,. 
Theorem 4.4.2 Let sup; E|&| < 00 and v be a random variable which does not 
depend on the future. Then the following assertions hold true. 
(a) IfEv < © (or EZ, < 00, where Z, = Y~p_) |&&|) then ES, exists and 

a,Ev < ES, < a*Ev. (4.4.4) 
(b) IfES,, is well defined (and may be +ov), a, > 0 and, for any N > 1, 

E(Sy —a,N;v>N) <c, 


where c does not depend on N, then (4.4.4) holds true. 
(c) If & => 0 then (4.4.4) is always valid. 


If S\ = const a.s. then condition (c) clearly implies (b). 
The case a* < 0 in assertions (b)—(c) can be treated in exactly the same way. 
If v does not depend on {&}, a, =a* =a > 0, then E(Sy; v > N) =aNP(Wv > 


N) and the condition in (b) holds. But the assumption a, = a* is inessential here, 
and, for independent v and {&}, (4.4.4) is always true, since in this case 


ES, =) P(v = EX <a* D> kP(W =k) =a" Ev. 
The reverse inequality ES, > a,Ev is verified in the same way. 


Proof of Theorem 4.4.2 
(a) First note that 


Ye PwebH=)_) PW=)=) Pe=)=—Ey, 
k=1 


k=1 i=k i=1 


Note also that, for E|&| < c < oo, the condition Ev < oo (or EZ, < oc) turns into 
condition (4.4.1), and assertion (4.4.4) follows from (4.4.2). Therefore, if Ev < co 


4.4 Expectations of Sums of a Random Number of Random Variables 79 


then Theorem 4.4.2 is a direct consequence of Theorem 4.4.1. The same is true in 
case (d). 

Consider now assertions (b) and (c). 

For a fixed N > 0, introduce the random variable 


Vy :=min(v, N), 


which, together with v, does not depend on the future. Indeed, if n < N then the 
event {vy <n} = {v <n} does not depend on §n+1,00. If m > N then the event 
{vy < N} is certain and hence it too does not depend on §n+1,00- 

(b) If Ev < oo then (4.4.4) is proved. Now let Ev = oo. We have to prove that 
ES, = oo. Since Evy < oo, the relations 


ES,, = E(S,; v < N) +E(Syi;v>N)> a, (E(v; v<N)+NP(0 > N)) (4.4.5) 
are valid by (a). Together with the conditions in (b) this implies that 
E(S,; v < N) > a,E(v; v < N) —c> 
as N — oo. Since S\, is well defined, we have 
E(S,; 1 < N) > ES, 
as N —> oo (see Corollary A3.2.1 in Appendix 3). Therefore necessarily ES, = 00. 
(c) Here it is again sufficient to show that ES, = oo in the case when Ev = ov. It 
follows from (4.4.5) that 
ES, = E(S,; v < N) + E(S); v > N) 
> E[S, — (Sy —a,N);v> N] +a,EQ); v < N)>a,E(v; v< N) —c> ow 


as N — oo, and thus ES, = oo. 
The theorem is proved. 


Theorem 4.4.2 implies the following famous result. 


Theorem 4.4.3 (Wald’s identity) Assume a = Eé does not depend on k, 
sup; E|&&| < 00, and a random variable v is independent of the future. Then, under 
at least one of the conditions (a)—(d) of Theorem 4.4.2 (with a, replaced by a), 


ES, =aEv. (4.4.6) 


If a = 0 and Ev = ov then identity (4.4.6) can hold, since there would be an 
ambiguity of type 0- oo on the right-hand side of (4.4.6). 


Remark 4.4.1 If there is no independence of the future then equality (4.4.6) is, gen- 
erally speaking, untrue. Let, for instance, a = E&, < 0, 6 := min{k : Sy = S} and 
S := sup, S; (see Example 4.4.1; see Chaps. 10-12 for conditions of finiteness of 
ES and E@). Then Sg = S > 0 and ES > 0, while aE@ < 0. Hence, (4.4.6) cannot 
hold true for v = 6. 

We saw that if there is no assumption on the finiteness of Ev then, even in the case 
a > 0, in order for (4.4.6) to hold, additional conditions are needed, e.g., conditions 


80 4 Numerical Characteristics of Random Variables 


(b)-(d). Without these conditions identity (4.4.6) is, generally speaking, not valid, 
as shown by the following example. 


Example 4.4.2 Let the random variables ¢; be independent and identically dis- 
tributed, and 


EX=0, Eg=l, Elu)>=u<0o, 
& := 14 V2ke,, v := min{k : Sx < O}. 


We will show below in Example 20.2.1 that v is a proper random variable, i.e. 


P(v <x) =2(Ubs <0) = 1. 


n=1 


It is also clear that v is a Markov time independent of the future and E& =a = 1. 
But one has ES, < 0, while aEv > 0, and hence equality (4.4.6) cannot be valid. 
(Here necessarily Ev = ov, since otherwise condition (a) would be satisfied and 
(4.4.6) would hold.) 

However, if the € are independent and identically distributed and v is a stop- 
ping time then statement (4.4.6) is always valid whenever its right-hand side is well 
defined. We will show this below in Theorem 11.3.2 by virtue of the laws of large 
numbers. 

Conditions (b) and (c) in Theorem 4.4.2 were used in the case Ev = oo. However, 
in some problems these conditions can be used to prove the finiteness of Ev. The 
following example confirms this observation. 


Example 4.4.3 Let &|,&,... be independent and identically distributed and a = 
Eé, > 0. For a fixed t > 0, consider, as a stopping time (and a variable independent 
of the future), the random variable 


v=n(t) :=min{k: S, >t}. 


Clearly, Sy <t on the set y(t) > N and Sy) > t. Therefore conditions (b) and (c) 
are satisfied, and hence 


ESy(1) = aEn(t). 


We now show that E7(t) < oo. In order to do this, we consider the “trimmed” 


random variables & ae := min(N, &;) and choose N large enough for the inequality 


a) := EE) > 0 to hold true. Let Ss and q(t) be defined similarly to S; and 


n(t), but for the sequence E)}, Then evidently Se) <t+N,n(t)< 1%) (t), 


t 


AN EQ” (t)<t+N, — En(t)< 
a) 


< O&. 


If a = 0 then En(t) = oo. This can be seen from the fair game example (& = +1 
with probability 1/2; see Example 4.2.3). In the general case, this will be shown be- 
low in Chap. 12. As was noted above, in this case the right-hand side of (4.4.6) turns 


4.4 Expectations of Sums of a Random Number of Random Variables 81 


into the indeterminacy 0 - oo, but the left-hand side may equal any finite number, as 
in the case of the fair game where S,(+) = t. 
If we take v to be the Markov time 


v= p(t) := min{k : | Sp] = t}, 


where & may assume values of both signs, then, to prove (4.4.6), it is apparently 
easier to verify the condition of assertion (a) in Theorem 4.4.2. Let us show that 
Ep(t) < oo. It is clear that, for any given t > 0, there exists an N such that 


qi= min[P(Sy > 2t), P(Sy < —2t)| >0. 
(N = 1 if the & are bounded from below.) For such NV, 


inf P(|v + Sy| >t) > 2¢. 
|u|st 


Hence, in each N steps, the random walk {$;} has a chance to leave the strip |u| < t 
with probability greater than 2g, whatever point v, |v| < f, it starts from. Therefore, 


k 
P((t) > kN) = P( max |S; t) P Seni <2 1-29). 
(u(t) > kN) od yleth< (al in| < I) < q) 
j= 
This implies that P(jz(t) > kN) decreases exponentially as k grows and that Ey(t) 
is finite. 


Example 4.4.4 A chain reaction scheme. Suppose we have a single initial particle 
which either disappears with probability q or turns into m similar particles with 
probability p = 1 — q. Each particle from the new generation behaves in the same 
way independently of the fortunes of the other particles. What is the expectation of 
the number ¢,, of particles in the n-th generation? 

Consider the “double sequence” (Ee ye. ie ,; Of independent identically dis- 
tributed random variables assuming the values m and 0 with probabilities p and q, 
respectively. The sequences (Ef? 2. bP ta Fal > ++» Will clearly be mutually inde- 
pendent. Using these sequences, one can represent the variables ¢, (¢9 = 1) as 


1 1 
2 2 
bn = EP FE, 


where the number of summands in the equation for ¢;, is €,;—1, the number of “parent 
particles”. Since the sequence {E{”} is independent of ¢,-1,¢ ‘a > 0, and Fé” = 
pm, by virtue of Wald’s identity we have 


Ee, = Eé( Ez, _1 = pmEt,_1 = (pm)". 


82 4 Numerical Characteristics of Random Variables 


Example 4.4.5 We return to the fair game of two gamblers described in Exam- 
ple 4.2.3, but now assume that the respective capitals z; > O and z2 > 0 of the 
gamblers are finite. Introduce random variables & representing the gains of the first 
gambler in the respective (k-th) play. The variables & are obviously independent, 
and 


£ 1 with probability 1/2, 
«~~ | =1. with probability 1/2. 


The quantity z} + Sk =zi+ ae , §; Will be the capital of the first gambler and 
z2 — Sx the capital of the second gambler after k plays. The quantity 


n= min{k:z; + S, =0 or z2 — S, = 0} 


is the time until the end of the game, i.e. until the ruin of one of the gamblers. The 
question is what is the probability P; that the i-th gambler wins (for i = 1, 2)? 

Clearly, 7 is a Markov time, S, = —z, with probability P,; and S, = zz with 
probability P; = 1 — P). Therefore, 


ES, = Piz — PZ}. 
If En < oo, then by Wald’s identity we have 
P)z2 — P9z, = EnKé) = 0. 


From this we find that P; = z;/(z1 + Z2). 

It remains to verify that E7 is finite. Let, for the sake of simplicity, z; + z2 = 
2z be even. With probability 2~™"©1-=2) > 2-<, the game can be completed in 
min(z1, 22) < z plays. Since the total capital of both players remains unchanged 
during the game, 


P(y > z)<1-2%, eae Ss P(n > Nz) < (1-272). 


This evidently implies the finiteness of 


CO 
En = » P(n > k). 
k=0 
We will now give a less trivial example of a random variable v which is indepen- 
dent of the future, but is not a stopping time. 


Example 4.4.6 Consider two mutually independent sequences of independent posi- 
tive random variables &), 2,... and ¢), 2, ..., such that €; € F and ¢; ©G. Further, 
consider a system consisting of two devices. After starting the system, the first de- 
vice operates for a random time &, after which it breaks down. Then the second 
device replaces the first one and works for & time units (over the time interval 
(&,, &; + &)). Immediately after the first device’s breakdown, one starts repairing it, 
and the repair time is 2. If 2 > &, then at the time & + & of the second device’s 
failure both devices are faulty and the system fails. If €2 < &, then at the time &; + &2 
the first device starts working again and works for &3 time units, while the second 


4.5 Variance 83 


device will be under repair for ¢3 time units. If €3 > &3, the system fails. If ¢3 < &3, 
the second device will start working, etc. What is the expectation of the failure-free 
operation time Tt of the system? 

Let v := min{k > 2: ¢, > &}. Then clearly t = &| + --- + &, where the &; 


are independent and identically distributed and {v <n} € o(&,...,6301,...,&). 
This means that v is independent of the future. At the same time, if bj const, then 
{v <n} € Sin =O(E1,...,&) and v is not a Markov time with respect to §1,y. 
Since & > 0, by Wald’s identity Et = Ev Eé1. Since 
k-1 
{v=k}=()ing so} Om > oh, k=, 
j=2 


one has P(v = k) = q*~*(1 — q), k > 2, where 
q=PUK < a= [arncu+o, 


Consequently, 


CO [o,@) 
= = 1 
Ev =) pkg —g)=1+ Deg -g=147—. 


k=2 el 
a= 
Er = Fé, ——“. 
lL 


Wald’s identity has a number of extensions (we will discuss these in more detail 
in Sects. 10.3 and 15.2). 


4.5 Variance 
We introduce one more numerical characteristic for random variables. 


Definition 4.5.1 The variance Var(é) of a random variable is the quantity 
Var(§) = E(é — B§)”. 


It is a measure of the “dispersion” or “spread” of the distribution of €. The vari- 
ance is equal to the inertia moment of the distribution of unit mass along the line. 
We have 


Var(&) = E&* — 2EEERE + (E£)* = EE? — (EE). (4.5.1) 
The variance could also be defined as ming E(é — a)’. Indeed, by that definition 
Var(&) = EE? + min(a* — 2aEé) = Eé? — (Eé)”, 
a 
since the minimum of a” — 2aEé is attained at the point a = Eé. This remark shows 
that the quantity a = Eé is the best mean square estimate (approximation) of the 


random variable &. 
The quantity ./ Var(&) is called the standard deviation of &. 


84 4 Numerical Characteristics of Random Variables 


Example 4.5.1 Let © ®, ,2. As we saw in Example 4.1.2, a = E&. Therefore, 


1 2 2 o2 2 
Var( )= | @- ae (x—a)"/20° dy = [ve /2 dp. 
: oV2n J 20 


IU 


The last equality was obtained by the variable change (x — a)/o = t. Integrating by 
parts, one gets 


2 ioe) 2 
oO 2 oO 2 
Var(é) = -——=te'”” + fe /2 dt =o". 
5 J 2 _o wV20 


Example 4.5.2 Let € &T1,,. In Example 4.1.3 we computed the expectation Eé = yu. 
Hence 


2 2 woke 2 
Var(&) = Eg* — (Eg) = YP — 
k=0 
oo k OO 7 k 
k(k—Dpk _ kk _ 
= a eee 
k=2 : k=0 : 
Example 4.5.3 For € € Uo,1, one has 
1 
1 1 
Be | x?dx=-, Eé = -. 
3 ) 


By (4.5.1) we obtain Var(é) = 75. 


Example 4.5.4 For & Bp, by virtue of the relations €” = € and E&* = EE = p we 
obtain Var(£) = p — p* = p(1 — p). 
Consider now some properties of the variance. 


D1. Var(&é) => 0, with Var(€) = 0 if and only if P(E =c) = 1, where c is a constant 
(not depending on w). 
The first assertion is obvious, for Var(é) = E(é — Ez)? > 0. Let 
P(é =c) = 1, then (E£)* = Eé” = c? and hence 


Var(€) =c? —c? =0. 


If Var(é) = E(é — Eé)* = 0 then (since (£ — Eé)? > 0) P(é — EE = 0) = 1, or 
P(é = Eé) = 1 (see property E4). 
D2. Ifa and b are constants then 


Var(a + b&) = b” Var(€). 


This property follows immediately from the definition of Var(é). 
D3. If random variables —€ and n are independent then 


Var(§ + 1) = Var(§) + Var(n). 


4.6 The Correlation Coefficient and Other Numerical Characteristics 85 


Indeed, 


Var(é + n) = E(E + n)* — (BE + En)? 
= Eé? + 2EéEn + En? — (E&)* — (En)* — 2E&En 
= Eé? — (E&)” + En* — (En)* = Var(&) + Var(n). 


It is seen from the computations that the variance will be additive not only for inde- 
pendent € and n, but also whenever 


Eén = EéEn. 


Example 4.5.5 Let v > 0 be an integer-valued random variable independent of a 
sequence {&;} of independent identically distributed random variables, Ev < oo and 
Eé; =a. Then, as we know, ES, = aEv. What is the variance of S,,? 

By the total probability formula, 


Var(S,) = E(S, — ES,)” = 9° Pv = k)E(S; — ES,)? 
=) > Pw =k) [E(S — ak)? + (ak — aEv)’] 
= J Pw =k Var(é)) + a’E(v — Ev)? = Var(&1)Ev + a” Var(v). 
This equality is equivalent to the relation 
E(S, — va)* = Ev- Var(£1). 


In this form, the relation remains valid for any stopping time v (see Chap. 15). 
Making use of it, one can find in Example 4.4.5 the expectation of the time 7 until 
the end of the fair game, when the initial capitals z; and z2 of the players are finite. 
Indeed, in that case a = 0, Var(&,) = 1 and 


ES; = Var(&) En = 2 P2 + 23P1. 
We find from this that En = z;z2. 


4.6 The Correlation Coefficient and Other Numerical 
Characteristics 


Two random variables € and 7 could be functionally (deterministically) dependent: 
& = g(n); they could be dependent, but not in a deterministic way; finally, they could 
be independent. The correlation coefficient of random variables is a quantity which 
can be used to quantify the degree of dependence of the variables on each other. 

All the random variables to appear in the present section are assumed to have 
finite non-zero variances. 

A random variable & is said to be standardised if EE = 0 and Var(€) = 1. Any 
random variable € can be reduced by a linear transformation to a standardised one 
by putting & := (€ — E&)/./Var(&). Let € and n be two random variables and &) 
and 7, the respective standardised random variables. 


86 4 Numerical Characteristics of Random Variables 


Definition 4.6.1 The correlation coefficient of the random variables € and 7 is the 
quantity p(§,) = Eg\n1. 


Properties of the correlation coefficient. 
1.|o.m| <1. 


Proof Indeed, 


O < Var(, +m) =E(é1 + m1)? =2 + 2p, n). 
It follows that |o(&, 7)| < 1. 


2. If § and y are independent then p(&,n) = 0. 
This follows from the fact that &; and 7; are also independent in this case. 


The converse assertion is not true, of course. In Sect. 4.3 we gave an example of 
dependent random variables € and 7 such that EE = 0, En = 0 and Eén = 0. The 
correlation coefficient of these variables is equal to 0, yet they are dependent. How- 
ever, as we will see in Chap. 7, for a normally distributed vector (€, 7) the equality 
p(&,n) = 0 is necessary and sufficient for the independence of its components. 

Another example where the non-correlation of random variables implies their 
independence is given by the Bernoulli scheme. Let P(E = 1) = p, P(§ = 0) = 
1— p, P(n = 1) = 4 and P(n = 0) = 1 — q. Then 


Eg = p, En = p, Var(é) = pU — p), Var(n) = q(1 — 4), 
_ EE-p)a-4) 
OY ae = pay 
The equality o(€, 7) = 0 means that Eé7 = pq, or, which is the same, 
PE=1n=1)=PE=)PQ=D), 
PE=1,n=0)=PE=1)-PE=1 n=) =p-— pq=PE=1)PM=0), 


and so on. 
One can easily obtain from this that, in the general case, € and 7 are independent 
if 


o(f &),g()) =0 
for any bounded measurable functions f and g. It suffices to take f = I(_¢0,x), 
g& = I-00, y), then derive that P(§ < x,7 < y) = P(€ < x)P(y < y), and make use 
of the results of the previous chapter. 
3. |o(&, n)| = 1 if and only if there exist numbers a and b 40 such that P(yn = 
a+bé)=1. 


Proof Let P(n7 =a+ bé) = 1. Set EE =a and ./Var(é) =o; then 
E-—a at+bi—-—a-—ba 
|blo 


p(é,n) =E = sign. 


4.7 Inequalities 87 


Assume now that | o(&, 7)| = 1. Let, for instance, o(€, 7) = 1. Then 
Var(E1 — m) = 2(1— p(é, n)) =0. 
By property D1 of the variance, this can be the case if and only if 
PE -—m=c)=1. 
If 0(€, n) = —1 then we get Var(é) + 1) = 0, and hence 


PE, +m=c=1. 


If p > 0 then the random variables € and 7 are said to be positively correlated; if 
p <O then € and 7 are said to be negatively correlated. 


Example 4.6.1 Consider a transmitting device. A random variable € denotes the 
magnitude of the transmitted signal. Because of interference, a receiver gets the 
variable 7 = w& + A (a is the amplification coefficient, A is the noise). Assume that 
the random variables A and & are independent. Let EE = a, Var(é) = 1, EA = 0 
and Var(A) = o*. Compute the correlation coefficient of € and 7: 


é.m=E(¢-9SHS=*) = 
PS. m= a = . 

/ 2 + o2 / 2 +o2 
If o is a large number compared to the amplification a, then p is close to 0 and 7 
essentially does not depend on &. If o is small compared to a, then p is close to 1, 
and one can easily reconstruct € from 7. 


We consider some further numerical characteristics of random variables. One 
often uses the so-called higher order moments. 


Definition 4.6.2. The k-th order moment of a random variable & is the quantity Eé*. 
The quantity E(é — Eé)* is called the k-th order central moment, so the variance is 
the second central moment of &. 

Given a random vector (&1,..., &)), the quantity Eg; oe wekn is called the mixed 
moment of order ki +--+ + kn. Similarly, E(é) — Eé,)* (E — EE, )k is said to 
be the central mixed moment of the same order. 


For independent random variables, mixed moments are evidently equal to the 
products of the respective usual moments. 


4.7 Inequalities 


4.7.1 Moment Inequalities 


Theorem 4.7.1 (Cauchy—Bunjakovsky’s inequality) Jf& and & are arbitrary ran- 
dom variables, then 


E|é\£| < [Be?Ee}]'””. 


88 4 Numerical Characteristics of Random Variables 
This inequality is also sometimes called the Schwarz inequality. 


Proof The required relation follows from the inequality 2|ab| < a” + b? if one puts 
a? = &? /E&?, b= & / Eg; and takes the expectations of the both sides. 


The Cauchy—Bunjakovsky inequality is a special case of more general inequali- 
ties. 


Theorem 4.7.2 Forr > 1, : + ; = |, one has Hélder’s inequality: 


1 1/s 
Elé2| <[Eléil"]" [Ell], 
and Minkowski’s inequality: 


1 1 1 
[Elé + &I"]'" < [Bla] + [Ele "]””. 
Proof Since x” is, for r > 1, a convex function in the domain x > 0, which at the 
point x = | is equal to | and has derivative equal to r, one has r(x — 1) < x” — 1 
for all x > 0. Putting x = (a/b)'/" (a > 0, b > 0), we obtain 


Pig tg ami pe eee b 
Sgr Sar 


’ 


or, which is the same, a!/"b!/S < a/r + b/r. If one puts 


at dete 1&2? 
ai= ; b:= 
E\é1|" E\é2|° 
and takes the expectations, one gets Hélder’s inequality. 
To prove Minkowski’s inequality, note that, by the inequality |&) + &| < |&1| + 
|E2|, one has 


Elé, + 60)" < Elé|l&: +)" ' + EBl&|lé +&l" |. 


Applying Holder’s inequality to the terms on the right-hand side, we obtain 


Elé +4)" < ((ea]” + [lel] } [ela te]. 


Since (r — 1)s =r, 1 — 1/s = 1/r, and Minkowski’s inequality follows. 


It is obvious that, for r = s = 2, Hodlder’s inequality becomes the Schwarz in- 
equality. 


Theorem 4.7.3 (Jensen’s inequality) Jf Eé exists and g(x) is a convex function, 
then g(Bé) < Eg(é). 


Proof If g(x) is convex then for any y there exists a number g!(y) such that, for 
all x, 


g(x) > g(y) +(x —y)g!Q). 


4.7 Inequalities 89 


Putting x = €, y = Eé, and taking the expectations of the both sides of this inequal- 
ity, we obtain 


Eg(é) >= g(Ké). 


The following corollary is also often useful. 


Corollary 4.7.1 For any0<v <u, 
1 1 
(Ele? ]!”” < [ee]. (4.7.1) 


This inequality shows, in particular, that if the u-th order moment exists, then the 
moments of any order v < u also exist. 

Inequality (4.7.1) follows from HGlder’s inequality, if one puts & := |&|”, 
& :=1,r:=u/v, or from Jensen’s inequality with g(x) = |x|“/” and |é|” in place 
of &. 


4.7.2 Inequalities for Probabilities 


Theorem 4.7.4 Let & > 0 with probability 1. Then, for any x > 0, 
EE; § > x) J ES 
x 


xX 


PEzx)< (4.7.2) 
If EE < & then P(E > x) =o0(1/x) asx > w. 
Proof The inequality is proved by the following relations: 
Ki > E(é; § > x) > xE(1; § > x) = xP > x). 


If EE < oo then E(é; € > x) — 0 as x > o~w. This proves the second statement 
of the theorem. 


If a function g(x) > 0 is monotonically increasing, then clearly {& : g(€) > 
g(e)} = {& : € => e} and, applying Theorem 4.7.4 to the random variable n = g(&), 
one gets 


Corollary 4.7.2 If g(x) t, g(x) = 0, then 


E(g(6);§ 2x) _ Eg) 
g(x) ~ g(x) 


PEZx)< 


In particular, for g(x) = e**, 


P(é>x)<e"Ee*, A>0. 


Corollary 4.7.3 (Chebyshev’s inequality) For an arbitrary random variable & with 
a finite variance, 


P(\é —Eé|>x) < 


Var(&) 
oe (4.7.3) 


90 4 Numerical Characteristics of Random Variables 


To prove (4.7.3), it suffices to apply Theorem 4.7.4 to the random variable n = 
(€ —Eé)? > 0. 


The assertions of Theorem 4.7.4 and Corollary 4.7.2 are also often called Cheby- 
shev’s inequalities (or Chebyshev type inequalities), since in regard to their proofs, 
they are unessential generalisations of (4.7.3). 

Using Chebyshev’s inequality, we can bound probabilities of various deviations 
of € knowing only Eé and Var(&). As one of the first applications of this inequality, 
we will derive the so-called law of large numbers in Chebyshev’s form (the law of 
large numbers in a more general form will be obtained in Chap. 8). 


Theorem 4.7.5 Let &, &,... be independent identically distributed random vari- 
ables with expectation E§ ; = a and finite variance o? and let S, = yi §;. Then, 
for any ¢ > 0, , 


(=a 
n 


ae 
Sel) s— a=? 0 
né 


as n — oo. 
We will discuss this assertion in Chaps. 5, 6 and 8. 


Proof of Theorem 4.7.5 follows from Chebyshev’s inequality, for 


EO! =a, var( =) = am = o 


n n2 n 


Now we will give a computational example of the use of Chebyshev’s inequality. 


Example 4.7.1 Assume we decided to measure the diameter of the lunar disk us- 
ing photographs made with a telescope. Due to atmospheric interference, measure- 
ments of pictures made at different times give different results. Let § — a denote 
the deviation of the result of a measurement from the true value a, EE = a and 
o = ./Var(E) = 1 on a certain scale. Carry out a series of n independent measure- 
ments and put ¢, := 1 (& +.--+8&,). Then, as we saw, E¢, = a, Var(f,) = o7/n. 
Since the variance of the average of the measurements decreases as the number of 
observations increases, it is natural to estimate the quantity a by fy. 

How many observations should be made to ensure |¢, — a| < 0.1 with a proba- 
bility greater than 0.99? That is, we must have P(|¢, — a| < 0.1) > 0.99, or P(g, — 
a| > 0.1) < 0.01. By Chebyshev’s inequality, P(|¢, — a| > 0.1) < o7/(n - 0.01). 
Therefore, if n is chosen so that o*/(n - 0.01) < 0.01 then the required inequality 
will be satisfied. Hence we get n > 104. 


The above example illustrates the possibility of using Chebyshev’s inequality to 
bound the probabilities of the deviations of random variables. However, this exam- 
ple is an even better illustration of how crude Chebyshev’s inequality is for practical 
purposes. If the reader returns to Example 4.7.1 after meeting with the central limit 


4.8 Extension of the Notion of Conditional Expectation 91 


theorem in Chap. 8, he/she will easily calculate that, to achieve the required accu- 
racy, one actually needs to conduct not 10+, but only 670 observations. 


4.8 Extension of the Notion of Conditional Expectation 


In conclusion to the present chapter, we will introduce a notion which, along with 
those we have already discussed, is a useful and important tool in probability theory. 
Giving the reader the option to skip this section in the first reading of the book, we 
avoid direct use of this notion until Chaps. 13 and 15. 


4.8.1 Definition of Conditional Expectation 


In Sect. 4.2 we introduced the notion of conditional expectation given an arbitrary 
event B with P(B) > 0 that was defined by the equality 


E(é; B) 
P(B) 


E(é|B) := (4.8.1) 


where 
EE: B)= | £dP = Eélp, 
B 


Iz = Ip(@) being the indicator of the set B. We have already seen and will see many 
times in what follows that this is a very useful notion. Definition 4.8.1 introducing 
this notion has, however, the deficiency that it makes no sense when P(B) = 0. How 
could one overcome this deficiency? 

The fact that the condition P(B) > 0 should not play any substantial role could be 
illustrated by the following considerations. Assume that € and 7 are independent, 
B= {n=x} and P(B) > 0. Then, for any measurable function g(x, y), one has 
according to (4.8.1) that 

Eg, Imax) Eg, x)Itj=x} 
a asta Tem a7 ee 
The last equality holds because the random variables y(€, x) and I,—,} are inde- 
pendent, being functions of € and 7 respectively, and consequently 


Ee, n)In=x; = Eg, x)P( = x). 


Relations (4.8.2) show that the notion of conditional expectation could also retain 
its meaning in the case when the probability of the condition is 0, for the equality 


E[g,n)|n =x] =Eg(, x) 


itself looks quite natural for independent € and 7 and is by no means related to the 
assumption that P(7 = x) > 0. 


92 4 Numerical Characteristics of Random Variables 


Fig. 4.1 Conditional é 
expectation as the projection 
of € onto Hy 


Let 2 be a sub-o -algebra of {§. We will now define the notion of the conditional 
expectation of a random variable € given 21, which will be denoted by E(é|2l). First 
we will give the definition in the “discrete” case, but in such a way that it can easily 
be extended. 

Recall that we call discrete the case when the o-algebra 2 is formed (gener- 
ated) by an at most countable sequence of disjoint events Aj, A2,..., Uj Aj =, 
P(A;) > 0. We will write this as 21 = 0 (Aj, Az, ...), which means that the elements 
of 2 are all possible unions of the sets A;, A2,.... 

Let L2 be the collection of all random variables (all the measurable func- 
tions €(w) defined on (92, , P)) for which Eé? < oo. In the linear space Lz one 
can introduce the inner product (€,7) = E(&7) (whereby L2 becomes a Hilbert 
space with the norm |é|| = (EE*)!/?; we identify two random variables £; and &5 if 
|&1 — &2|| = 0, see also Remark 6.1.1). 

Now consider the linear space Hy of all functions of the form 


E(o) = > cla, (©), 
k 


where I4, (@) are indicators of the sets Az. The space Hy is clearly the space of 
all 2l-measurable functions, and one could think of it as the space spanned by the 
orthogonal system {I4, (w)} in Lo. 

We now turn to the definition of conditional expectation. We know that the con- 
ventional expectation a = Eé of € € L2 can be defined as the unique point at which 
the minimum value of the function g(a) = E(é — a)” is attained (see Sect. 4.5). Con- 
sider now the problem of minimising the functional g(a) = E(é — a(w))*, E€eLd, 
over all 2(-measurable functions a(@) from Hy. 


Definition 4.8.1 Let € € Lz. The 2{-measurable function a(w) on which the mini- 
mum Minge Hy Y(a) is attained is said to be the conditional expectation of & given 
Q{ and is denoted by E(&|2l). 


Thus, unlike the conventional expectations, the conditional expectation E(é |2l) is 
a random variable. Let us consider it in more detail. It is evident that the minimum 
of g(a) is attained when a(q) is the projection e of the element & in the space L> 
onto Hy, i.e. the element € € Hx for which € =F i Hy (see Fig. 4.1). In that case, 
for any a € Hy, 


4.8 Extension of the Notion of Conditional Expectation 93 


n~ 


E-acHy, §&§-ELE-a, 
g(a) =Eé —€ +8 — a)? =E —€Y + EGE — a)’, 
g(a) > v), 


and g(a) = v) ifa=Eas. 

Thus, in L2 the conditional expectation operation is just an orthoprojector onto 
Ay (é = E(&|2l) is the projection of € onto Hy). 

Since, for a discrete o-algebra 2, the element e being an element of Hg, has the 
form é =) ex14,, the orthogonality condition € — £ Hy (or, which is the same, 
E(é — é ) 14, = 0) determines uniquely the coefficients c,: 

_ EE; Ad) 


Eé1a,) =cxP(Ag), Ch= P(A, = E(é|Ax), 


so that 


E(E|2) =F = ) EEA. 
k 


Thus the random variable E(&|2) is constant on Ax and, on these sets, is equal 
to the average value of & on Ax. 

If € and 2 are independent (i.e. P(E € B; Ax) = P(E € B)P(A;)) then clearly 
E(é; Ag) = Eé P(Ax) and = Eé. If 21= § then F is also discrete, € is constant on 
the sets Ax and hence 3 =i5. 

Now note the following basic properties of conditional expectation which allow 
one to get rid of the two special assumptions (that € € L2 and 2 is discrete), which 
were introduced at first to gain a better understanding of the nature of conditional 
expectation: 


(1) é is 2-measurable. 
(2) For any event A € A, 


E€; A) =E¢é; A). 


The former property is obvious. The latter follows from the fact that any event 
A € & can be represented as A € J, Aj,, and hence 


E€; A) = ) EG; Aj,) = >, P(Aj) = > EG; Aj,) =E€: A). 
k k k 


The meaning of this property is rather clear: averaging the variable € over the set A 
gives the same result as averaging the variable € which has already been averaged 
over A j,. 


Lemma 4.8.1 Properties (1) and (2) uniquely determine the conditional expecta- 
tion and are equivalent to Definition 4.8.1. 


Proof 1n one direction the assertion of the lemma has already been proved. Assume 
now that conditions (1) and (2) hold. 2(-measurability of € means that € is constant 


94 4 Numerical Characteristics of Random Variables 


on each set A;. Denote by c, the value of é on A;. Since A; € 2, it follows from 
property (2) that 
E@; Ag) = ceP(Ax) =E€E; Av), 
and hence, for w € Ax, 
E(é; Ax) 


mame CT 


The lemma is proved. 


Now we can give the general definition of conditional expectation. 


Definition 4.8.2 Let € be a random variable on a probability space (2, §, P) and 
2 C ¥F an arbitrary sub-o -algebra of §. The conditional expectation of & given 2 is 
a random variable € which is denoted by E(&|2l) and has the following two proper- 
ties: 


(1) 3 is 2-measurable. . 
(2) For any A € &, one has E(E; A) = E(é; A). 


In this definition, the random variable € can be both scalar and vector-valued. 

There immediately arises the question of whether such a random variable exists 
and is unique. In the discrete case we saw that the answer to this question is positive. 
In the general case, the following assertion holds true. 


Theorem 4.8.1 Jf E|é| is finite, then the function E= E(&|20) in Definition 4.8.2 
always exists and is unique up to its values on a set of probability 0. 


Proof First assume that & is scalar and € > 0. Then the set function 
Q(4)= f eap=BE A), ew 
A 


will be a measure on (£2, 2l) which is absolutely continuous with respect to P, for 
P(A) = 0 implies Q(A) = 0. Therefore, by the Radon—Nykodim theorem (see Ap- 
pendix 3), there exists an 2l-measurable function a = E(&|2l) which is unique up to 
its values on a set of measure zero and such that 


Qu)= | Far=BE a). 


In the general case we put & = €¢ — €~, where &* := max(0,&) > 0, 7 := 
max(0, —&) > 0, Ess =é et =f ~ and €+ ~ are conditional expectations of €~. This 
proves the existence of the conditional expectation, since & satisfies conditions (1) 
and (2) of Definition 4.8.2. This will also imply uniqueness, for the assumption 
on non-uniqueness of ‘a would imply non-uniqueness of € &+ or ee The proof for 
vector-valued & reduces to the one-dimensional case, since the components of é will 
possess properties (1) and (2) and, for the components, the existence and uniqueness 
have already been proved. 


4.8 Extension of the Notion of Conditional Expectation 95 


The essence of the above proof is quite transparent: by condition (2), for any 
A € 2l we are given the value 


EG; A) = / EaP, 
A 


i.e. the values of the integrals of é over all sets A € 2 are given. This clearly should 
define an 2l-measurable function é uniquely up to its values on a set of measure 
zero. 

The meaning of E(é|2l) remains the same: roughly speaking, this is the result of 
averaging of € over “indivisible” elements of 2. 

If 21 = ¥§ then evidently £ = & satisfies properties (1) and (2) and therefore 
E(é|5) =. 


Definition 4.8.3 Let € and 7 be random variables on (2, 5, P) and 2 = o(n) be 
the o-algebra generated by the random variable 7. Then E(é|2l) is also called the 
conditional expectation of € given n. 


To simplify the notation, we will sometimes write E(&|7) instead of E(é|o (7)). 
This does not lead to confusion. 

Since E(&|7) is, by definition, a o(7)-measurable random variable, this means 
(see Sect. 3.5) that there exists a measurable function g(x) for which E(é|n) = 
g(n). By analogy with the discrete case, one can interpret the quantity g(x) as the 
result of averaging &€ over the set {7 = x}. (Recall that in the discrete case g(x) = 


E(é|n = x).) 


Definition 4.8.4 If & = Ic is the indicator of a set C € §, then E(Ic |) is called the 
conditional probability P(C|20) of the event C given 2. If 21= 0(n), we speak of 
the conditional probability P(C|7) of the event C given 7. 


4.8.2 Properties of Conditional Expectations 


1. Conditional expectations have the properties of conventional expectations, the 
only difference being that they hold almost surely (with probability 1): 


(a) E(a + bé|2) =a + DE(E|A). 
(b) E(E1 + &2/2) = EE) 2) + E(E2|2). 
(c) If€1 < &2 a.s., then E(§1|2) < E(E2|2) a.s. 


To prove, for instance, property (a), one needs to verify, according to Defini- 
tion 4.8.2, that 


(1) a+ bE(&|2l) is an 2l-measurable function; 
(2) E(a+ bé; A) = E(a + DE(E|Q0; A) for any A € 2. 


96 4 Numerical Characteristics of Random Variables 


Here (1) is evident; (2) follows from the linearity of conventional expectation (or 
integral). 

Property (b) is proved in the same way. 

To prove (c), put, for brevity, gE := E(é;|2l). Then, for any A € 2, 


[ Bar=B@i: 4) = BG: A) s Ee y= | bar, [ G-farzo. 


This implies that & _ E >Oas. 

2. Chebyshev’s inequality. If & > 0, x > 0, then P(E > x|2l) < E(E|Q0)//x. 

This property follows from 1(c), since P(é > x|2l) = E(Iie>x}|20), where I, is 
the indicator of the event A, and one has the inequality Ijz>,} < &/x. 

3. If 2 and o(n) are independent, then E(&|2l) = E&. Since e = Eé is an 2- 
measurable function, it remains to verify the second condition from Definition 4.8.2: 
for any A € 2, 

E(é; A) = E€é; A). 
This equality follows from the independence of the random variables I, and € and 
the relations E(é; A) = E(éI4) = E€EI, = E€: A). 

It follows, in particular, that if € and n are independent, then E(é|7) = Eé. If the 
o-algebra 2 is trivial, then clearly one also has E(&|2l) = Eé. 

4. Convergence theorems that are true for conventional expectations hold for 
conditional expectations as well. For instance, the following assertion is true. 


Theorem 4.8.2 (Monotone convergence theorem) /f0 < &) + & a.s. then 
E(En|W t EEA as. 


Indeed, it follows from &,+1 > &) a.s. that ae > &, a.s., where E. = E(é, |20). 
Therefore there exists an 2(-measurable random variable € such that &, + € a.s. By 
the conventional monotone convergence theorem, for any A € 2, 


[Sars | Ear. [ears | car. 
A A A A 


Since the left-hand sides of these relations coincide, the same holds for the right- 
hand sides. This means that = E(é |2(). 

5. If n is an A-measurable scalar random variable, E|&| < 00, and E\En| < ov, 
then 


E(né|2l) = nE |). (4.8.3) 
If & => 0 and yn = 0 then the moment conditions are superfluous. 

In other words, in regard to the conditional expectation operation, 2l-measurable 
random variables behave as constants in conventional expectations (cf. prop- 
erty 1(a)). 

In order to prove (4.8.3), note that if 7 = Ig (the indicator of a set B € 2) then 
the assertion holds since, for any A € 2, 


[ wane ar =f wear= [ gap [ BEI dP= f THEE RD aP. 
A A AB AB A 


4.8 Extension of the Notion of Conditional Expectation 97 


This together with the linearity of conditional expectations implies that the assertion 
holds for all simple functions 7. 

If € > 0 and 7 > O then, taking a sequence of simple functions 0 < n, + and 
applying the monotone convergence theorem to the equality 


E(1n§ |2) = mE |20, 


we obtain (4.8.3). Transition to the case of arbitrary € and 7 is carried out in the 
standard way—by considering positive and negative parts of the random variables 
€ and 7. In addition, to ensure that the arising differences and sums make sense, we 
require the existence of the expectations E|&| and E|&n|. 

6. All the basic inequalities for conventional expectations remain true for condi- 
tional expectations as well, in particular, Cauchy—Bunjakovsky’s inequality 


B(i61821| 4) < [E(E EEE 1) ] 
and Jensen’s inequality: if E|E| < oo then, for any convex function g, 
g(E(E|2)) < E(g(€)|2). (4.8.4) 


Cauchy—Bunjakovsky’s inequality can be proved in exactly the same way as for 
conventional expectations, for its proof requires no properties of expectations other 
than linearity. 

Jensen’s inequality is a consequence of the following relation. By convexity of 
g(x), for any y, there exists a number g*(y) such that g(x) > g(y) + (x — y)g*(y) 
(g*(y) = g'(y) if g is differentiable at the point y). Putx =&, y= f= E(&|20), and 
take conditional expectations of the both sides of the inequality. Then, assuming for 
the moment that 


E(|(& —8)g*@)]) < co (4.8.5) 

(this can be proved if E|g(&)| < 00), we get 

E[(§ — €)g*@)|B] = e* OEE — F12) =0 
by virtue of property 5. Thus we obtain (4.8.4). In the general case note that the 
function g*(y) is nondecreasing. Let (y_y, yy) be the maximal interval on which 
Ig*(y)| < N. Put 
g(y) ify €ly_n, yn], 
g(yin) = (vy — yin) N if y 2 yan. 
(y+n can take infinite values if +g*(y) are bounded as y > oo. Note that the values 
of g*(y) are always bounded from below as y — oo and from above as y > —co, 


hence g*(yiy) 2 0 for N large enough.) The support function g%, (y) corresponding 
to gv () has the form 


gn(y) = 


gv (y) = max[—N, min(N, g*(y)) | 


and, consequently, is bounded for each N. Therefore, condition (4.8.5) is satisfied 
for gi (y) (recall that E|é| < oo) and hence 


gn) < E(gy(€)|2). 


98 4 Numerical Characteristics of Random Variables 


Further, we have gy (y) ¢ g(y) as N > ov for all y. Therefore the left-hand side 
of this inequality converges everywhere to g(&) as N — oo, but the right-hand side 
converges to E(g¢(&)|2l) by Theorem 4.8.2. Property 6 is proved. 


7. The total probability formula 
Eg = EE(é|2) 
follows immediately from property 2 of Definition 4.8.2 with A= 2. 


8. Iterated averaging (an extension of property 7): if 2 Cc 21; Cc ¥ then 
E(E|% = E[E(E|A)| A]. 


Indeed, for any A € 2, since A € 2; one has 


[ eeeiajar= f eeinar= fears Bear. 
A A A A 


The properties 1, 3-5, 7 and 8 clearly hold for both scalar- and vector-valued 
random variables €. The next property we will single out. 


9. For € € Lo, the minimum of E(é — a(w))* over all X-measurable functions 
a(q) is attained at a(w) = E(€|Qh. 

Indeed, E(E — a(w))* = EE((é — a(w))?|2), but a(w) behaves as a constant in 
what concerns the operation E(-|2l) (see property 5), so that 


E((é — a())"|%) = E((é — E20)” |) + (EE|W — a(w))? 


and the minimum of this expression is attained at a(w) = E(&|2l). 

This property proves the equivalence of Definitions 4.8.1 and 4.8.2 in the case 
when & € L2 (in both definitions, conditional expectation is defined up to its values 
on a set of measure 0). In this connection note once again that, in L2, the operation 
of taking conditional expectations is the projection onto Hg (see our comments to 
Definition 4.8.1). 

Property 9 can be extended to the multivariate case in the following form: for any 
nonnegative definite matrix V, the minimum min(é — a(w))V (é — a(w))" over all 
21-measurable functions a(@) is attained at a(w) = E(é |Q). 

The assertions proved above in the case where & € Lo and the o-algebra 2 is 
countably generated will surely hold true for an arbitrary o-algebra 2, but the sub- 
stantiation of this fact requires additional work. 

In conclusion we note that property 5 admits, under wide assumptions, the fol- 
lowing generalisation: 

5A. If n is 2-measurable and g(w, 1) is a measurable function of its arguments 
w € 2 and n € R* such that E|g(w, n)|2)| < 00, then 


E(g(@, n)|2) = E(g(@, y)|2)| 
This implies the double expectation (or total probability) formula. 


Eg(w,n) = E[E(g(, y)|%)|,_, ]. 


a (4.8.6) 


4.9 Conditional Distributions 99 


which can be considered as an extension of Fubini’s theorem (see Sects. 4.6 and 3.6). 
Indeed, if g(w, y) is independent of 21, then 


E(g(@, y)|2) = Eg(a, y), E(g(o, n)| 2) =Eg(o, Dds 
Eg(o, n) = E[Eg(o, y)| jy: 


In regard to its form, this is Fubini’s theorem, but here 7 is a vector-valued ran- 
dom variable, while w can be of an arbitrary nature. 

We will prove property 5A under the simplifying assumption that there exists 
a sequence of simple functions 7, such that g(@,n) t+ g(@,n) and h(w, nn) + 
h(w,n) a.s., where h(w, y) = E(g(@, y)|20)). Indeed, let n, = yg for w € Ag C 2. 
Then 


8(®, Mn) = Ys, yr) A, - 


By property 5 it follows that (4.8.6) holds for the functions n,. It remains to 
make use of the monotone convergence theorem (property 4) in the equality 
E(g(@, nn) (2) = h(@, m). 


4.9 Conditional Distributions 


Along with conditional expectations, one can consider conditional distributions 
given sub-o -algebras and random variables. In the present section, we turn our at- 
tention to the latter. 

Let € and 7 be two random variables on (2, %, P) taking values in R’ and R*, 
respectively, and let 8° be the o-algebra of Borel sets in R*. 


Definition 4.9.1 A function F(B|y) of two variables y € R* and B € 8° is called 
the conditional distribution of & given n = y if: 

1. For any B, F(B|n) is the conditional probability P(é € B\n) of the event 
{€ € B} given n, i.e. F(B|y) is a Borel function of y such that, for any A € Bk, 


E(F(B|n): 1 € A) = [ Fainrc é dy) =P eB, ne A). 


2. For any y, F(B|y) is a probability distribution in B. 


Sometimes we will write the function F(B|y) in a more “deciphered” form as 
F(Bly) =P € Bln=y). 

We know that, for each B € 3°, there exists a Borel function gg(y) such that 
gp(n) =P € B\n). Thus, putting P(Bly) := gg(y), we will satisfy condition 1 
of Definition 4.9.1. Condition 2, however, does not follow from the properties of 
conditional expectations and by no means needs to hold: indeed, since conditional 
probability P(€ € B|n) is defined for each B up to its values on a set Ng of zero 


100 4 Numerical Characteristics of Random Variables 


measure (so that there exist many variants of conditional expectation), and this set 
can be different for each B. Therefore, if the union 


N= |J No 


BesBs 
has a non-zero probability, it could turn out that, for instance, the equalities 


P(E € By U Bo|n) = PE € Biln) + PE € Ba|n) 


(additivity of probability) for all disjoint B} and Bz from %8* hold for no w € N, ie. 
on an w-set N of positive probability, the function gg(y) will not be a distribution 
as a function of B. 

However, in the case when & is a random variable taking values in IR* with the 
o-algebra 8° of Borel sets, one can always choose gp(n) = P(E € B\n) such that 
2B()) will be a conditional distribution.” 

As one might expect, conditional probabilities possess the natural property that 
conditional expectations can be expressed as integrals with respect to conditional 
distributions. 


Theorem 4.9.1 For any measurable function g(x) mapping R* into R such that 
E|g(&)| < ov, one has 


E(g(&)|n) = [ sorcaxin. (4.9.1) 


Proof It suffices to consider the case g(x) > 0. If g(x) = I4(%) is the indicator of 
a set A, then formula (4.9.1) clearly holds. Therefore it holds for any simple (i.e. 
assuming only finitely many values) function g,(x). It remains to take a sequence 
&8n * g and make use of the monotonicity of both sides of (4.9.1) and property 4 
from Sect. 4.8. 


In real-life problems, to compute conditional distributions one can often use the 
following simple rule which we will write in the form 
ReSeyayje (4.9.2) 
P( € dy) 
Both conditions of Definition 4.9.1 will clearly be formally satisfied. 
If € and 7 have a joint density, this equality will have a precise meaning. 


Definition 4.9.2 Assume that, for each y, the conditional distribution F(B|y) is 
absolutely continuous with respect to some measure yp in R*: 


PEE BIn=y)= | flalyynlan. 


Then the density f(x|y) is said to be the conditional density of & (with respect to 
the measure |L) givenn=y. 


2For more details, see e.g. [12, 14, 26]. 


4.9 Conditional Distributions 101 


In other words, a measurable function f(x|y) of two variables x and y is the 
conditional density of € given n = y if: 


(1) For any Borel sets A C Ré and B CR’, 


/ / fxly)u(dx)P(n € dy) = P(E € B,n€ A). (4.9.3) 
yeA JxEB 


(2) For any y, the function f(x|y) is a probability density. 


It follows from Theorem 4.9.1 that if there exists a conditional density, then 


E(e@)|n) = f so feenucan. 


If we additionally assume that the distribution of 7 has a density g(y) with re- 
spect to some measure A in IR‘, then we can re-write (4.9.3) in the form 


/ i: falyq(y) w(dx) A(dy) = P(E € B,n € A). (4.9.4) 
yeA JxeB 


Consider now the direct product of spaces R° and R* and the direct product of 
measures pz x A onit (if C= Bx A, BCR’, AC R¥ then p x A(C) = w(B)A(A)). 
In the product space, relation (4.9.4) evidently means that the joint distribution of & 
and 7 in R® x R* has a density with respect to w x A which is equal to 


f(x,y) = fxly)qQ). 


The converse assertion is also true. 


Theorem 4.9.2 [f the joint distribution of — and n in R° x R* has a density f (x, y) 

with respect to x X, then the function 

f@,y) 
q(y) 


is the conditional density of & given n = y, and the function q(y) is the density of n 
with respect to the measure i. 


fly) = 


where q(y) = f foxy) wld) 


Proof The assertion on g(y) is obvious, since 


[a A(dy) = P(y € A). 


It remains to observe that f(x|y) = f(x, y)/q(y) satisfies all the conditions from 
Definition 4.9.2 of conditional density (equality (4.9.4), which is equivalent to 
(4.9.3), clearly holds here). 


Theorem 4.9.2 gives a precise meaning to (4.9.2) when & and 7 have densities. 


Example 4.9.1 Let &, and & be independent random variables, €; € ,,, 2 € M,,. 
What is the distribution of | given | + 2 =n? We could easily compute the de- 
sired conditional probability P(é; = k|&; + & =n), k <n, without using Theo- 
rem 4.9.2, for &; + 2 © TI,,4,, and the probability of the event {& + & =n} is 


102 4 Numerical Characteristics of Random Variables 


positive. Retaining this possibility for comparison, we will still make formal use of 
Theorem 4.9.2. Here & and 7 = &; + & have densities (equal to the corresponding 
probabilities) with respect to the counting measure, so that 


akyn-k 
f(k,n) =P, =k, n =n) = P(E, =k, & =n—-k) = nT 
gin) =P(qq=n) ae BAF RN 
Therefore the required density (probability) is equal to 
flkin) =P =k nan) = LE - _™ _ pty — py, 


qi) kiin—B!” 


where p = 4/(A, + A2). Thus the conditional distribution of &; given the fixed sum 
&| + & =n is a binomial distribution. In particular, if €|,...,&- are independent, 
&; © II,, then the conditional distribution of €; given the fixed sum | +---+&, =n 
will be BY Ir which does not depend on 2. 

The next example answers the same question as in Example 4.9.1 but for nor- 
mally distributed random variables. 


Example 4.9.2 Let ®, ,2 be the two-dimensional joint normal distribution of ran- 
dom variables &; and &, where a = (a1, a2), aj = Eé;, and c= \|o7,; || is the co- 
variance matrix, oj; = E(&; — a;)(§; — aj), i, j = 1, 2. The determinant of o~ is 
2 2 2 
|o7| = 011022 — of) =011022(1 — p*), 


where p is the correlation coefficient of &; and 2. Thus, if |o| 4 1 then the covari- 
ance matrix is non-degenerate and has the inverse 


wl ees OE 

A= (02)! _ 1 022 O12] 1 O11 /011022 
2 -—o Oo ‘7 1 D: p 1 
lo | 12 ” p VO11F12 022 


Therefore the joint density of & and & (with respect to Lebesgue measure) is (see 
Sect. 3.3) 


1 
2m \/011022(1 — p*) 
1 [soa Zee any a, yma] 


2(1 — p*) O11 011022 022 
(4.9.5) 


fa, y= 


x exp| 


The one-dimensional densities of & and & are, respectively, 


eo —a1)"/Qony 


1 
/ 21011 19) = 27022 


Hence the conditional density of &; given & = y is 


eo -a2)"/202) (4.9.6) 


fQ@)= 


4.9 Conditional Distributions 103 


Fig. 4.2 Illustration to 
Example 4.9.4. Positions of 
the target’s centre, the first 
aimpoint, and the first hit 


Target’s centre 


The first aimpoint 
—_—-e 
The first hit 


f(x,y) 
q(y) 


f(xly) = 


2 
a 
= exp x—aj—p y—az ; 
27011 (1 — p2) 2011 (1 — 9) 022 
which is the density of the normal distribution with mean a; + ¢ ae — a2) and 
variance 01; (1 — p°). 
This implies that f(x|y) coincides with the unconditional density of f(x) in 
the case p = 0 (and hence &; and &) are independent), and that the conditional 
expectation of & given & is 


E(&1|&2) = a1 + p/o11 /022(2 — a2). 


The straight line x = a; + p./o11/022(y — a2) is called the regression line of & 
on &2. It gives the best mean-square approximation for &; given &) = y. 


Example 4.9.3 Consider the problem of computing the density of the random vari- 
able € = g(¢,7) when ¢ and n are independent. It follows from formula (4.9.3) 
with A = R¥ that the density of the distribution of € can be expressed in terms of 
the conditional density f(x|y) as 


fz / F(xly)P( € dy). 


In our problem, by f(x|y) one should understand the density of the random variable 
pS, y), since P(E € Bln = y) = PCE, y) € B). 


Example 4.9.4 Target shooting with adjustment. A gun fires at a target of a known 
geometric form. Introduce the polar system of coordinates, of which the origin is 
the position of the gun. The distance r (see Fig. 4.2) from the gun to a certain point 
which is assumed to be the centre of the target is precisely known to the crew of the 
gun, while the azimuth is not. However, there is a spotter who communicates to the 
crew after the first trial shoot what the azimuth deviation of the hitting point from 
the centre of the target is. 

Suppose the scatter of the shells fired by the gun (the deviation (&, 7) of the hit- 
ting point from the aimpoint) is described, in the polar system of coordinates, by the 
two-dimensional normal distribution with density (4.9.5) with a = 0. In Sect. 8.4 we 
will see why the deviation is normally distributed. Here we will neglect the circum- 
stance that the azimuth deviation € cannot exceed z while the distance deviation & 


104 4 Numerical Characteristics of Random Variables 


cannot assume values in (—oo, —r). (The standard deviations o; and 02 are usually 
very small in comparison with r and 7, so this fact is insignificant.) If the azimuth 6 
of the centre of the target were also exactly known along with the distance r, then 
the probability of hitting the target would be equal to 


ce y)dx dy, 
Br,B) 
where B(r, 8) = {(x, y): (7 + x, 6+ y) © B} and the set B represents the target. 
However, the azimuth is communicated to the crew of the gun by the spotter based 
on the result of the trial shot, i.e. the spotter reports it with an error 6 distributed 
according to the normal law with the density q(y) (see (4.9.6)). What is the proba- 


bility of the event A that, in these circumstances, the gun will hit the target from the 
second shot? If 5 = z, then the azimuth is communicated with the error z and 


P(A|d =z) = // fx, y — z) dx dy =: g(z). 
B(r,B) 
Therefore, 


1 


P(A) = E[P(A|5)] = Ey(6) = 
(A) =E[P(A|5)] = Eg(6) aa 


oo 
| eo? /097) 9 (2) dz. 
—0o 


Example 4.9.5 The segment [0,1] is broken “at random” (i.e. with the uniform 
distribution of the breaking point) into two parts. Then the larger part is also broken 
“at random” into two parts. What is the probability that one can form a triangle from 
the three fragments? 

The triangle can be formed if there occurs the event B that all the three fragments 
have lengths smaller than 1/2. Let @; and w be the distances from the points of the 
first and second breaks to the origin. Use the complete probability formula 


P(B) = EP(B|)). 


Since @, is distributed uniformly over [0,1], one only has to calculate the con- 
ditional probability P(B|@,). If @; < 1/2 then @2 is distributed uniformly over 
[@, 1]. One can construct a triangle provided that 1/2 < w2 < 1/2 +. ,. Therefore 
P(B|@)) = @1/(1 — @1) on the set {@; < 1/2}. We easily find from symmetry that, 
for @ > 1/2, 


l-—a| 
P(Bla1) = : 


Hence 


dx 
dx =—1+21n2. 
—x 


4.9 Conditional Distributions 105 
One could also solve this problem using a direct “geometric” method. The den- 

sity f(x, y) of the joint distribution of (@, w2) is 

ry ifx<1/2, yelx, 1, 

i ifx> 1/2, ye [0, x], 


0 otherwise. 


f@,y)= 


It remains to compute the integral of this function over the domain corresponding 
to B. 


All the above examples were on conditional expectations given random variables 
(not o-algebras). 

The need for conditional expectations given o-algebras arises where it is diffi- 
cult to manage working just with conditional expectations given random variables. 
Assume, for instance, that a certain process is described by a sequence of random 
variables {&)}9o _ oo Which are not independent. Then the most convenient way to 
describe the distribution of €, given the whole “history” (i.e. the values &9, 1, 
&_2,...) 1s to take the conditional distribution of &; given o (&, €_1,...). It would 
be difficult to confine oneself here to conditional distributions given random vari- 
ables only. Respective examples are given in Chaps. 13, 15-22. 


Chapter 5 
Sequences of Independent Trials 
with Two Outcomes 


Abstract The weak and strong laws of large numbers are established for the 
Bernoulli scheme in Sect. 5.1. Then the local limit theorem on approximation of 
the binomial probabilities is proved in Sect. 5.2 using Stirling’s formula (covering 
both the normal approximation zone and the large deviations zone). The same sec- 
tion also contains a refinement of that result, including a bound for the relative error 
of the approximation, and an extension of the local limit theorem to polynomial dis- 
tributions. This is followed by the derivation of the de Moivre—Laplace theorem and 
its refinements in Sect. 5.3. In Sect. 5.4, the coupling method is used to prove the 
Poisson theorem for sums of non-identically distributed independent random indica- 
tors, together with sharp approximation error bounds for the total variation distance. 
The chapter ends with derivation of large deviation inequalities for the Bernoulli 
scheme in Sect. 5.5. 


5.1 Laws of Large Numbers 


Suppose we have a sequence of trials in each of which a certain event A can oc- 
cur with probability p independently of the outcomes of other trials. Form a se- 
quence of random variables as follows. Put € = 1 if the event A has occurred in 
the k-th trial, and & = 0 otherwise. Then (&)?° , will be a sequence of indepen- 
dent random variables which are identically distributed according to the Bernoulli 
law: P(& = 1) = p, P(& = 0) =q =1— p, E& = p, Var(&) = pq. The sum 
Sr=é+---+& € Bi, is simply the number of occurrences of the event A in the 
first n trials. Clearly ES, = np and Var(S,) =npq. 

The following assertion is called the law of large numbers for the Bernoulli 
scheme. 


Theorem 5.1.1 For any ¢ > 0 
(F-al>*) 
P| |——-—p\|>e}]>-0 asnow. 
n 
This assertion is a direct consequence of Theorem 4.7.5. One can also obtain the 
following stronger result: 


A.A. Borovkov, Probability Theory, Universitext, 107 
DOI 10.1007/978-1-4471-5201-9_5, © Springer-Verlag London 2013 


108 5 Sequences of Independent Trials with Two Outcomes 


Theorem 5.1.2 (The Strong Law of Large Numbers for the Bernoulli Scheme) For 
k 
> 7 P 


anyé >0,asn— oO, 
S 
P( sup > :) > 0. 
k>n k 


The interpretation of this result is that the notion of probability which we intro- 
duced in Chaps. | and 2 corresponds to the intuitive interpretation of probability 
as the limiting value of the relative frequency of the occurrence of the event. In- 
deed, S,/n could be considered as the relative frequency of the event A for which 
P(A) = p. It turned out that, in a certain sense, S,/n converges to p. 


Proof of Theorem 5.1.2 One has 


~s)=0(Uf|t-» -+) 


s ° E(S;, — kp)4 
A p)>e) <)> (Sx P) 


k4e4 
k=n k=n 


S 
P( sup aoe D 
k>n k 


(5.1.1) 


Here we again made use of Chebyshev’s inequality but this time for the fourth mo- 
ments. Expanding we find that 


k 4 ok 
E(S; — kp)* = ee - n) = ) EG; — p)* +6 > G — py; - py 


j=1 j=l i<j 
= k(pq* + qp*) + 3k(k — (pq)? <k+kk-D=K. (5.1.2) 


Thus the probability we want to estimate does not exceed the sum 


CO 
e* Sk 730 asn— OOo. 


k=n 


It is not hard to see that we would not have found the required bound if we used 
Chebyshev’s inequality with second moments in (5.1.1). 

We could also note that one actually has much stronger bounds for 
P(|S, — kp| > ek) than those that we made use of above. These will be derived 
in Sect. 5.5. 


Corollary 5.1.1 Jf f (x) is a continuous function on [0, 1] then, asin > ov, 


Sn 
Bs(=) > f(p) (5.1.3) 


uniformly in p. 


5.2. The Local Limit Theorem and Its Refinements 109 


<B(|F(*)-10 <e) 


Sn Sn 
+k(|/ —}—-f(P));|— — Pp >) 


n n 
< sup|f(p+x)— f(p)| + bn(e), 

where the quantity 6(¢) is independent of p by virtue of (5.1.1), (5.1.2), and since 

bn(€) > Oasn > ow. 


Proof For any ¢ > 0, 


Blf(*)- so) tap 


’ 


S; 
n 


|x|Se 


Corollary 5.1.2 If f (x) is continuous on [0, 1], then, asn > oo, 


us k\ (n 
Ss (=) ( sta — xy" * > f(x) 
n/]\k 
k=0 
uniformly in x € [0, 1]. 
This relation is just a different form of (5.1.3) since 
P(Sn =k) = (‘) piap 


(see Chap. 1). This relation implies the well-known Weierstrass theorem on approxi- 
mation of continuous functions by polynomials. Moreover, the required polynomials 
are given here explicitly—they are Bernstein polynomials. 


5.2 The Local Limit Theorem and Its Refinements 


5.2.1 The Local Limit Theorem 


We know that P(S, =k) = (7) p*q"-*, q =1— p. However, this formula becomes 
very inconvenient for computations with large n and k, which raises the question 
about the asymptotic behaviour of the probability P(S, = k) as n > oo. 
In the sequel, we will write a, ~ b,for two number sequences {a,} and {b,} if 
an [by — 1 asin — oo. Such sequences {a,} and {b,} will be said to be equivalent. 
Set 


ie 
H(x)=xIn~4+(1—x)In . gree (5.2.1) 
Pp l= 


Theorem 5.2.1 As k > o andn—-—k>«o, 
1 
Jixnp — p) 


P(S, =k) = p(= = »’) ~ exp{—nH(p*)}. (5.2.2) 


110 5 Sequences of Independent Trials with Two Outcomes 


Proof We will make use of Stirling’s formula according to which n! ~ /2mnn"e~" 


as n — oo. One has 


P(S, =H) =(7 ) pkg" ~ | a er ae 
n k Iak(n — k) kk(n — ky"-* 


1 
v 2mnp*(1 — p*) 
k n—k 
x exp; —kIn (n—k)ln +kinp+(n—k)Indi — p) 
n n 


1 * 
= exp{—n| p* In p* + (1 — p*) In(1 — p* 
mda a) 
— p*Inp — (1~ p*) Ind — p)]} 
1 
= —__ exp|/nH(p*)}. 
Tanpe — p*) pin (p")} 
If p* =k/n is close to p, then one can find another form for the right-hand side 
of (5.2.2) which is of significant interest. Note that the function H (x) is analytic on 
the interval (0, 1). Since 


nw x 1-x Mae + 1 
(x) =In— —In , A«)=—+—, (5.2.3) 
Pp ip p i1l-x 


one has H(p) = H’(p) =0 and, as p* — p > 0,! 


H(p*) = (- + ~)( *— p)’ + O(|p*— pl’). 


P 4 
Therefore if p* ~ p and n(p* — p)*? > 0 then 


P(S, =k) ~ 


1 exn| n (p* »)°| 
V2 pq 2pq 
Putting 


er 2, 


A= 


1 1 
; (x) == 
V1Pq V20 
one obtains the following assertion. 
Corollary 5.2.1 /f z=n(p* — p) =k —np = o(n?/3) then 
P(S, =k) =P(S, — np =z) ~ o(zA)A, (5.2.4) 


where ~ = ¢0,1(x) is evidently the density of the normal distribution with parame- 
ters (0, 1). 


' According to standard conventions, we will write a(z) = o(b(z)) as z — zo if b(z) > O and 


lim:-+y {2 = 0, and a(z) = O(b(2)) as <> zo if b(z) > 0 and timsup,_.,, 42! < oo. 


5.2 The Local Limit Theorem and Its Refinements 111 


This formula also enables one to estimate the probabilities of the events of the 
form {S;, <k}. 

If p* differs substantially from p, then one could estimate the probabilities of 
such events using the results of Sect. 1.3. 


Example 5.2.1 Ina jury consisting of an odd number n = 2m + | of persons, each 
member makes a correct decision with probability p = 0.7 independently of the 
other members. What is the minimum number of members for which the verdict 
rendered by the majority of jury members will be correct with a probability of at 
least 0.99? 

Put & = | if the k-th jury member made a correct decision and & = 0 otherwise. 
We are looking for odd numbers n for which P(S, < m) < 0.01. It is evident that 
such a trustworthy decision can be achieved only for large values of n. In that case, 
as we established in Sect. 1.3, the probability P(S;, < m) is approximately equal to 


(n+1—m)p P 
————— P(S;, = m) © 
(n+1l)p-—m 2p—-1 


Using Theorem 5.2.1 and the fact that in our problem 


"wa, H(- inp, in 
aS, — }=-—-In _— Fi — |} =In| —— ], 
a 2 goons 2 P 


Dp 2 1 1 
P(S, <m) exp) —nH 
2p—1V an 2 2n 
2 1 1 1 
~—? ,/ — exp, —nH + —H' 
2p—1V an 2 2 2 


2a (1 1 

~ Sa) a — p))” + 0.915— (0.84)"/?. 
(2p —1)./an Jn 

On the right-hand side there is a monotonically decreasing function a(n). Solving 


the equation a(n) = 0.01 we get the answer n = 33. The same result will be obtained 
if one makes use of the explicit formulae. 


P(S, =m). 


5.2.2 Refinements of the Local Theorem 


It is not hard to bound the error of approximation (5.2.2). If, in Stirling’s formula 
ni = J/2nnn"e~"t?™ | we make use of the well-known inequalities 


1 
< 6(n) < —_, 
12n+1 12n 


then the same argument will give the following refinement of Theorem 5.2.1. 


See, e.g., [12], Sect. 2.9. 


112 5 Sequences of Independent Trials with Two Outcomes 


Theorem 5.2.2 
P(S, =k) = ee exp{nH(p*) + 0(k,n)}, (5.2.5) 
/2anp* (I — p*) 
where 
|a(k,n)| = |0(n) — 0(k)0(n — k)| < : + d = : 
12k 12(n—k)— 12np*(1 — p*) 


(5.2.6) 
Relation (5.2.4) could also be refined as follows. 


Theorem 5.2.3 For all k such that |p* — p| < 5min(p, q) one has 


P(S, =k) = g(zA)A(1 + e(k,n)), 


\z|>.A4 1 . 
1+ e(k,n) = exp} 0 5 + Ie + A » |vl<1. 


As one can easily see from the properties of the Taylor expansion of the func- 
tion e*, the order of magnitude of the term ¢(k,n) in the above formulae coin- 
cides with that of the argument of the exponential. Hence it follows from Theo- 
rem 5.2.3 that for z =k —np = o(A~*/3) or, which is the same, z = o(n”/+), one 
still has (5.2.4). 


where 


Proof We will make use of Theorem 5.2.2. In addition to formulae (5.2.3) one can 
write: 
—1)¥(k — 2)! k—2)! 
=D" ) ( ) ee: 
xk-1 qd 7 x)k-1 


1 
H(p*) = pq (P— PY +R 


H®= 


; : (k) us a 
where we can estimate the residual Rj = “7°. Z We ) (p* — p). Taking into account 


that 


1 1 
JH (p)| < n(4+ a), k>2, 


and letting for brevity |p* — p| = 6, we get for 6 < 5 min(p, q) the bounds 
[o,@) 


(kK—2)!/ 1 1 bf 1 1 1 1 
Lol» Baer (sa+ Iye3 a 


2,58 
k=3 Pl P 


<?(3+3) 5° 
< S 
~ 6\p?  g?} — 3(pq)? 


5.2 The Local Limit Theorem and Its Refinements 113 


From this it follows that 


k—np)?  o|k—npl|> ZA | z|2A4 
—nH(p*) = ( P) 4 1| : _ 5 11Z| - Wl<. 
2npq 3(npq) 2 3 
(5.2.7) 


We now turn to the other factors in equality (5.2.5) and consider the product 
p*(1 — p*). Since —p <1— p— p* < 1— p, we have 


|p*(1 — p*) — pU— p)| =|(p — p*)(1- p— p*)| < |p* — p| max(p, q). 
This implies in particular that, for | p* — p| < 5 min(p, q), one has 


1 1 
|p". = p*) — pal < 5 Pa, p*(1— p*) > spa. 


Therefore one can write along with (5.2.6) that, for the values of k indicated in 
Theorem 5.2.3, 


a eee (5.2.8) 
in —__ = —. 2: 
6npq 6 


It remains to consider the factor [p*(1 — p*)]~!/2. Since for |y| < 1/2 


It+y ] 
ina +y91=|f ~dx| <2lyl, 
1 


one has for 6 = | p* — p| < (1/2) min(p, q) the relations 


In(p*(1 — p*)) = Inpq +in(1 Ae mesa 


Pq 


0*6 : 
In(pq) + In ar ; \3 | < max(p, q); 


5.2.9 
o*8 25 ea?) 
Inj 1 = , |¥2| <max(p,q), 
Pq Pq 
1/2 = 026 
[p*(1- p*)] . = [pq] "2 exp 2°. 


Using representations (5.2.7)-(5.2.9) and the assertion of Theorem 5.2.2 com- 
pletes the proof. 


One can see from the above estimates that the bounds for % in the statement 
of Theorem 5.2.3 can be narrowed if we consider smaller deviations |p* — p|—if 
they, say, do not exceed the value a min(p, g) where a < 1/2. 

The relations for P(S, = k) that we found are the so-called local limit theorems 
for the Bernoulli scheme and their refinements. 


114 5 Sequences of Independent Trials with Two Outcomes 


5.2.3 The Local Limit Theorem for the Polynomial Distributions 


The basic asymptotic formula given in Theorem 5.2.1 admits a natural extension 
to the polynomial distribution Bi. p=(P1,---, Pr), when, in a sequence of inde- 
pendent trials, in each of the trials one has not two but r > 2 possible outcomes 


Aj,..., A, of which the probabilities are equal to p1,..., p,, respectively. Let sy 
be the number of occurrences of the event Aj; inn trials, 


Sn=(Srrv eS), k= Ck), p=, 


and put H(x) = >> x; In(x;/p;), x = (41, ..., x). Clearly, S, € Bi. The following 
assertion is a direct extension of Theorem 5.2.1. 


Theorem 5.2.4 If each of the r variables k,,...,k, is either zero or tends to oo as 
n— oo then 


r 


—1/2 
PS, =1)~ ernst I] ri) exp{—nH(p*)}. 
j=l 

pre 


where ro is the number of variables k,, ...,k- which are not equal to zero. 


Proof As in the proof of Theorem 5.2.1, we will use Stirling’s formula 
ni~ V2nne "n" 


as n — oo. Assuming without loss of generality that all k; > 00, j=1,...,r, we 


get 
ij ne 
P(S; =k) ~ amy't-e( 8 ) 1(“) 
j=l 


TTj=1 kj kj 
r 


-1/2 i 
kj; pjn 
= (-r)/2 * Kj, pjn 
= (20n)"" (IT) oof - In 2 |. 
j= 


j=l 


5.3. The de Moivre—Laplace Theorem and Its Refinements 


Let a and b be two fixed numbers and ¢, = (S, — np)/,/npq. Then 


Pla <t%)<b)= a P(S, —np =z). 
a./npg <z<b./npq 
If, instead of P(S,, — np = z), we substitute here the values g(zA)A (see Corol- 
lary 5.2.1), we will get an integral sum )°,—.,—,9(zA)A corresponding to the 
integral i g(x) dx. 


5.3. The de Moivre—Laplace Theorem and Its Refinements 115 


Thus relations (5.2.4) make the equality 


b 
jim Pa <&<b)= i v(x) dx = ®(b) — D(a) (5.3.1) 


a 


plausible, where ®(x) is the normal distribution function with parameters (0, 1): 


@(x) = “P22 ay. 


1 x 
—— e 
AV 20 i 
This is the de Moivre—Laplace theorem, which is one of the so-called integral limit 
theorems that describe probabilities of the form P(S, < x). In Chap. 8 we will derive 
more general integral theorems from which (5.3.1) will follow as a special case. 

Theorem 5.2.3 makes it possible to obtain (5.3.1) together with an error bound 
or, in other words, with a bound for the convergence rate. 
Let A and B be integers, 
A—np B-np 


= . $e 5.3.2 
4 Tapa RPG oe 


Theorem 5.3.1 Let b > a, c= max(|al, |b|), and 
3 2 
ase ASE a 
3 6 
If A=1/./npq < 1/2 and p < 1/2 then 


b 
P(A <S, < B)=P(a<&, <b) =f g(t)dt(1+ dj Ac\(1+282p), (5.3.3) 


a 


where |0;| < 1,i=1,2. 


This theorem shows that the left-hand side in (5.3.3) can be equivalent to }(b) — 
@ (a) for growing a and b as well. In that case, 6(b) — (a) can converge to 0, and 
knowing the relative error in (5.3.1) is more convenient since its smallness enables 
one to establish that of the absolute error as well, but not vice versa. 


Proof First we note that, for all k such that |z| = |k — np| < c,/npq, the con- 
ditions of Theorem 5.2.3 will hold. Indeed, to have the inequality |p* — p| < 
(1/2) min(p, q) it suffices that |k — np| < npq/2 = 1/(2A?). This inequality will 
hold if c < 1/(2A). But since p < 1/2, one has 


2 
3)A 
jee 2 1/2, cA<1/2. 


Thus, for each k such that a./npq < z < b,/npq, we can make use of Theorem 5.2.3 
to conclude that 


116 5 Sequences of Independent Trials with Two Outcomes 


P(A < S, < B) 

= Yo Pn=h 

a./npqsz<b./npq 

IzPAt \ 39 

= >> o(cd)A} 1+ (exp)? a lal JA py, 

a<zA<b 

(5.3.4) 
where |?| < 1. Since, for p < 1, 
eP—] 
<e—-1<2, 
p 


the absolute value of the correction term in (5.3.4) does not exceed (substituting 
there zA = c) 


{o(S+ a+<)| 1 
exp —+c —];- 
3 6 


Therefore 


eA A? 


P(A<S,<B)= D> g@A)A[L+ 2019], (5.3.5) 
a<zA<b 
where |2;| < 1. 
Now we transform the sum on the right-hand side of the last equality. To this end, 
note that, for any smooth function g(x), 


a max xo) (5.3.6) 
2 x<t<x+A : ~ 


xtA 
40 . i; g(t) dt 


But for the function g(x) = (27) '/2e-¥"/ one has g(x) = —xg(x) and the max- 
imum value of g(t) on the segment [x, x + A], |x| <c, differs from the minimum 
value by not more than the factor exp{cA + A*/2}. Therefore, for |x| <c, one has 
by virtue of (5.3.6) 


xtA 
sve -[ g(t) dt 


Ac Ac. x+A 
< So ec4t4’2 min g(t) < ec4t 42 y(t) dt. 
2 x<t<x+A 2 p 


Since cA + A?/2 < 1/24 1/8, ect a?/2 < 2, we have the representation 


x+A 
Av) =f g(t)dt(l+%Ac), |h| <1. 
x 


Substituting this into (5.3.5) we obtain the assertion of the theorem. 


Thus by Theorem 5.3.1 the difference 
IPQ <in <y) — (@(y) — @))| (5.3.7) 


5.4 The Poisson Theorem and Its Refinements 117 


can be effectively, yet rather roughly, bounded from above by a quantity of the order 
1/,/npq if x = a, y = b (assuming that a and b are values which can be represented 
in the form (k — np)A, see (5.3.2)). If x and y do not belong to the mentioned 
lattice with the span A then the error (5.3.7) will still be of the same order since, 
for instance, when y varies, P(x < ¢, < y) remains constant on the semi-intervals 
of the form (a + kA,a + (k + 1)A], while the function ®(y) — ®(x) increases 
monotonically with a bounded derivative. A similar argument holds for the left end 
point x. It is important to note that the error order 1/,/npq cannot be improved, for 
the jumps of the distribution function of ¢, are just of this order of magnitude by 
Theorem 5.2.2. 

Theorem 5.3.1 enables one to use the normal approximation for P(x < f; < y) 
in the so-called large deviations range as well, when both x and y grow in absolute 
value and are of the same sign. In that case, both ®(y) — ®(x) and the probability 
to be approximated tend to zero. Therefore the approximation can be considered 
satisfactory only if 


PO <oin <y) 
(P(y) — P()) 
As Theorem 5.3.1 shows, this convergence will take place if 


c =max(|x1, |yl) = o0(A7 1/9) 


(5.3.8) 


or, which is the same, c = o(ni/ 6). For more details about large deviation probabil- 
ities, see Chap. 9. 

For larger values of c, as one could verify using Theorem 5.2.1, relation (5.3.8) 
will, generally speaking, not hold. 

In conclusion we note that since 


P(|fn| >b) +0 
as b — on, it follows immediately from Theorem 5.3.1 that, for any fixed y, 
lim Pon < y) = ®(y). 
nC 


Later we will show that this assertion remains true under much wider assumptions, 
when ¢;, is a scaled sum of arbitrary distributed random variables having finite vari- 
ances. 


5.4 The Poisson Theorem and Its Refinements 


5.4.1 Quantifying the Closeness of Poisson Distributions to Those 
of the Sums Sy 


As we saw from the bounds in the last section, the de Moivre—Laplace theorem 
gives a good approximation to the probabilities of interest if the number npg (the 
variance of S,,) is large. This number will grow together with n if p and q are fixed 


118 5 Sequences of Independent Trials with Two Outcomes 


positive numbers. But what will happen in a problem where, say, p = 0.001 and 
n = 1000 so that np = 1? Although n is large here, applying the de Moivre—Laplace 
theorem in such a problem would be meaningless. It turns out that in this case the 
distribution P(S, = k) can be well approximated by the Poisson distribution I,, 
with an appropriate parameter value ju (see Sect. 5.4.2). Recall that 


k 
11,(B)= >> a 


O0<keB 


Put np = pL. 
Theorem 5.4.1 For all sets B, 


2 
|P(S, € B) — 11,,(B)| < = 


We could prove this assertion in the same way as the local theorem, making use 
of the explicit formula for P(S,, = k). However, we can prove it in a simpler and 
nicer way which could be called the common probability space method, or coupling 
method. The method is often used in research in probability theory and consists, 
in our case, of constructing on a common probability space random variables S;, 
and S*, the latter being as close to S, as possible and distributed according to the 
Poisson distribution. 

It is also important that the common probability space method admits, without 
any complications, extension to the case of non-identically distributed random vari- 
ables, when the probability of getting 1 in a particular trial depends on the number of 
the trial. Thus we will now prove a more general assertion of which Theorem 5.4.1 
is a special case. 

Assume that we are given a sequence of independent random variables €),..., &n, 
such that &; € B Pj: Put, as above, S, = Se ,§j;- The theorem we state below is 
intended for approximating the probability P(S,, = k) when p; are small and the 
number ju = es pj; is “comparable” with 1. 


Theorem 5.4.2 For all sets B, 


n 
|P(Sn € B) — H,(B)| < > 7. 
j=l 


To prove this theorem we will need an important “stability” property of the Pois- 
son distribution. 


Lemma 5.4.1 Jf and n2 are independent, n. & U,, and n2 € U,,, then? 
1 + 2 = Tyj4po- 


3This fact will also easily follow from the properties of characteristic functions dealt with in 
Chap. 7. 


5.4 The Poisson Theorem and Its Refinements 119 


Proof By the total probability formula, 


k 


Pm +m =k) = >) POm = j)P(m =k — J) 
j=0 


_ k-j ,— 
ue Hi yd pha 
j= 


k k 
ae) = oe (HITH2) j 
jl k=O 31g ict 


7 (uit p2)ke tH) 


k! 
Proof of Theorem 5.4.2 Let @1,...,@n, be independent random variables, each be- 
ing the identity function (€(@,) = w,) on the unit interval with the uniform dis- 
tribution. We can assume that the vector w = (@),...,@n) is given as the identity 


function on the unit n-dimensional cube 92 with the uniform distribution. 
Now construct the random variables €; and & ; on §2 as follows: 


0 ifw;<1—-p;, a 0 ifw;<e Ps, 
&;(@) = f §} (@) = ean 
ifw;>1—p kK>1 if; € [7x-1, 7k), 


_ . ( id 
where 1k = Doim<k & Pj PE a0 a Pree 
It is evident that the €;(w) are independent and §;(w) EB Pj? Hi (w) are also 
jointly independent with &F (@) € I,,. Now note that since 1 — pj < e~?/ one has 
Ej(@) F 57 (w) only if w; €[1— pj,e-?/) or a; €[e-?i + pje Pi, 1]. Hence 


P(E; #E1) = (e-P/ — 1+ pj) + (1—e-P — pje”/) = pj(1—-e) < p} 


and 
P(S# 5%) =P( Ut #6jl) < 0h 


where Sy = )0"_) & € My. 
Now we can write 
P(Sn € B) = P(Sn € B, Sn = Sz) + P(Sn € B, Sn # Sz) 
= P(S* € B) — P(S* € B, S, # S*) + P(S, € B, S, # S*), 
so that 
|P(S, € B) — P(S; € B)| 
< |P(S% € B, S, # St) —P(Sp € BL Sy A SZ) <P(S, AS). 4D 


The assertion of the theorem follows from this in an obvious way. 


Remark 5.4.1 One can give other common probability space constructions as well. 
One of them will be used now to show that there exists a better Poisson approxima- 
tion to the distribution of S,. 


120 5 Sequences of Independent Trials with Two Outcomes 


Namely, let 5 (w) be independent random variables distributed according to the 
Poisson laws with parameters r; = —In(1 — p;) = pj, so that PEF =O=e"%H= 
1 — p;. Then €;(@) = min{1, i (@)} — Bp, and, moreover, 

n n n 
r( Ulgi@ms 7) < )\P(E*(@) = 2) = ile - rye). 
j=l j=l j=l 
But for r = — In(1 — p) one has the inequality 


2 
[ae re" = p+ (= pyindl = p) = pb p)(- -2) 
2 
= 
= (1+ p). 


Hence for the new Poisson approximation we have 
1 n 
P(St #Sn) <5) PIU + Py) 
j=l 


Putting A = — )°_, In(1 — pj) = D7j=; pj, the same argument as above will lead 
to the bound 


1 n 
sup|P(Sn € B)—11,(B)| < 5 >, pi(l + Pj)- 
jal 


This bound of the rate of approximation given by the Poisson distribution with a 
“slightly shifted” parameter is better than that obtained in Theorem 5.4.2. Moreover, 


one could note that, in the new construction, €; < Sis Sn < S*, and consequently 


P(Sp > k) < P(Sy > k) = 11, ([k, 00)). 


5.4.2 The Triangular Array Scheme. The Poisson Theorem 


Now we will return back to the case of identically distributed €. To obtain from 
Theorem 5.4.2 a limit theorem of the type similar to that of the de Moivre—Laplace 
theorem (see (5.3.1)), one needs a somewhat different setup. In fact, to ensure 
that np remains bounded as n increases, p = P(& = 1) needs to converge to zero 
which cannot be the case when we consider a fixed sequence of random variables 
&1, &,.... 
We introduce a sequence of rows (of growing length) of random variables: 
(1), 
1 ’ 
~%) ~2). 


I? 2.2 
3 3 1 
i ee 


Bf. EN Es. oats Bee 


5.4 The Poisson Theorem and Its Refinements 121 


This is the so-called triangular array scheme. The superscript denotes the row num- 
ber, while the subscript denotes the number of the variable in the row. 


Assume that the variables &{”) in the n-th row are independent and &,"” € B,,, 
k=1,...,n. 


Corollary 5.4.1 (The Poisson theorem) Jfnp;, — «4 > 0 as n— oo then, for each 
fixed k, 
P(S, =k) > My, ({k}), (5.4.2) 


(n) 


where Sy = &; dione 


Proof This assertion is an immediate corollary of Theorem 5.4.1. It can also be 
obtained directly, by noting that it follows from the equality 
P(Sn =k) = (") pu=py 


that 
P(S, =k+1)  n—k P bw 
P(S,=kK)  k+ll—p k+l 


P(Sp = 0) = e"O“P) we, 


Theorem 5.4.2 implies an analogue of the Poisson theorem in a more general 


case as well, when the g@ are not necessarily identically distributed* and can take 


values different from 0 and 1. 


Corollary 5.4.2 Assume that p jn = P(E - = 1) depend onn and j so that 


n 
max pjn>0, YI Pjin> W>0, P(E; =0) = 1 pjn + (Din). 
j=l 


Then (5.4.2) holds. 


Proof To prove the corollary, one has to use Theorem 5.4.2 and the fact that 
n n 
r( Lie 40e" 4 4) < )-o(pjn) = 0(1), 
j=l j=l 


which means that, with probability tending to 1, all the variables ee assume the 
values 0 and | only. 


One can clearly obtain from Theorems 5.4.1 and 5.4.2 somewhat stronger asser- 
tions than the above. In particular, 


sup|P(S), € B)—- 11,,(B)| >0 asn->o. 
B 


4 An extension of the de Moivre—Laplace theorem to the case of non-identically distributed random 
variables is contained in the central limit theorem from Sect. 8.4. 


122 5 Sequences of Independent Trials with Two Outcomes 


Note that under the assumptions of Theorem 5.4.1 this convergence will also 
take place in the case where np —> oo but only if np* — 0. At the same time, the 
refinement of the de Moivre—Laplace theorem from Sect. 5.3 shows that the normal 
approximation for the distribution of S,, holds if np — oo (for simplicity we assume 
that p <q so that npq > snp —> 00). 

Thus there exist sequences p € {p : np — 00, np* —> 0} such that both the 
normal and the Poisson approximations are valid. In other words, the domains of 
applicability of the normal and Poisson approximations overlap. 

We see further from Theorem 5.4.1 that the convergence rate in Corollary 5.4.1 
is determined by a quantity of the order of n~!. Since, as n > 00, 


2 


P(Sq = 0) — 1, ({0}) =e" BO“? — ew Fee, 
8 


this estimate cannot be substantially improved. However, for large k (in the large 
deviations range, say) such an estimate for the difference 


P(S, =k) — TI, ({k}) 


becomes rough. (This is because, in (5.4.1), we neglected not only the different signs 
of the correction terms but also the rare events {S,, = k} and {S* = k} that appear in 
the arguments of the probabilities.) Hence we see, as in Sect. 5.4, the necessity for 
having approximations of which both absolute and relative errors are small. 

Now we will show that the asymptotic equivalence relations 


P(S, =k) ~ H1,,({k}) 
remain valid when k and «1 grow (along with n) in such a way that 
k=o(n), — w=o(n*?), — |k— wl = on). 
Proof Indeed, 


POS) =) = (P) phd = pyr t= MODES Ha py 


k 
= m1)... (1 AEN pyrtem 
k! n n 


= TH, ({k) ee. 


Thus we have to prove that, for values of k and ju from the indicated range, 


1 k—1 n—k pn 
e(k,n) =n (1 = -) si (1 - ja — py *et =a(i); (5.4.3) 


We will obtain this relation together with the form of the correction term. Namely, 
we will show that 


a eae: 3,3 
Ae “ HW) +o(* = ) (5.4.4) 


5.4 The Poisson Theorem and Its Refinements 123 


and hence 


_ — 1)2 3 3 
P(S)=4) = (1+ * ae +o(* age ))m,. (a). 


2n n 


We make use of the fact that, as a > 0, 


a2 
In(1 — a) = —a — ae O(a"). 


Then relations (5.4.3) and (5.4.4) will follow from the equalities 
k-1 F k-1 . 3 3 
J J k k(k — 1) k 
Inj 1 — O = O ; 
2 a( 1) dX n 7 (=) 2n * n2 


2. 
(n —K)In(1 — p) + pn= n= (-p- > + o(v")) + pn 


2 3 
k 
es H+0(5). 
n 


2n n 


In conclusion we note that the approximate Poisson formula 


k 
P(S, =k) © — 


is widely used in various applications and has, as experience and the above estimates 
show, a rather high accuracy even for moderate values of n. 

Now we consider several examples of the use of the de Moivre—Laplace and 
Poisson theorems for approximate computations. 


Example 5.4.1 Suppose we are given 10* packets of grain. It is known that there are 
5000 tagged grains in the packets. What is the probability that, in a particular fixed 
packet, there is at least one tagged grain? We can assume that the tagged grains are 
distributed to packets at random. Then the probability that a particular tagged grain 
will be in the chosen packet is p = 107+. Since there are 5000 such grains, this 
will be the number of trials, i.e. 7 = 5000. Define a random variable &; as follows: 
& = | if the k-th grain is in the chosen packet, and & = 0 otherwise. Then 


5000 


Ssoo0 =) & 
k=1 


will be the number of tagged grains in our packet. By Theorem 5.4.1, P(Sso00 = 
0) © e~"P = e~ so that the desired probability is approximately equal to 1 — 
e~°5. The accuracy of this relation turns out to be rather high (by Theorem 5.4.1, 
the error does not exceed 2~! x 10~*). If we used the Poisson theorem instead of 
Theorem 5.4.1, we would have to imagine a triangular array of Bernoulli random 
variables, our &% constituting the 5000-th row of the array. Moreover, we would 
assume that, for the n-th row, one has np, = 0.5. Thus the conditions of the Poisson 
theorem would be met and we could make use of the limit theorem to find the 
approximate equality we have already obtained. 


124 5 Sequences of Independent Trials with Two Outcomes 


Example 5.4.2. A similar argument can be used in the following problem. There are 
n dangerous bacteria in a reservoir of capacity V from which we take a sample of 
volume v < V. What is the probability that we will find the bacteria in the test 
sample? 

One usually assumes that the probability p that any given bacterium will be in the 
test sample is equal to the ratio v/ V. Moreover, it is also assumed that the presence 
of a given bacterium in the sample does not depend on whether the remaining n — 1 
bacteria are in the test sample or not. In other words, one usually postulates that the 
mechanism of bacterial transfer into the test sample is equivalent to a sequence of n 
independent trials with “success” probability equal to p = v/ V in each trial. 

Introducing random variables &; as above, we obtain a description of the number 
of bacteria in the test sample by the sum S, = )~;_, & in the Bernoulli scheme. 
If nv is comparable in magnitude with V then by the Poisson theorem the desired 
probability will be equal to 


PS. Ss 0)k1=e""", 


Similar models are also used to describe the number of visible stars in a certain 
part of the sky far away from the Milky Way. Namely, it is assumed that if there are 
n visible stars in a region R then the probability that there are k visible stars in a 


subregion r C R is 
n\ k 
1 — ’ 
( tp (1— p) 


where p is equal to the ratio S(r)/S(R) of the areas of the regions r and R respec- 
tively. 


Example 5.4.3 Suppose that the probability that a newborn baby is a boy is constant 
and equals 0.512 (see Sect. 3.4.1). 

Consider a group of 10* newborn babies and assume that it corresponds to a 
series of 10* independent trials of which the outcomes are the events that either a 
boy or girl is born. What is the probability that the number of boys among these 
newborn babies will be greater than the number of girls by at least 200? 

Define random variables as follows: & = 1 if the k-th baby is a boy and & =0 
otherwise. Then S, = )> ae & is the number of boys in the group. The quantity 
npq ~ 2.5 x 10° is rather large here, hence applying the integral limit (de Moivre— 
Laplace) theorem we obtain for the desired probability the value 


—_ 5100 — 5120 
P(S, > 5100) =1 r/ naar, ) 
Jpg /2500 


~ 1 — ®(—20/50) = 1 — &(—0.4) ¥ 0.66. 


To find the numerical values of ®(x) one usually makes use of suitable statistical 
computer packages or calculators. 


5.5 Inequalities for Large Deviation Probabilities in the Bernoulli Scheme 125 


In our example, A = 1/,/npg © 1/50, and a satisfactory approximation by the de 
Moivre—Laplace formula will certainly be ensured (see Theorem 5.3.1) for c < 2.5. 
If, however, we have to estimate the probability that the proportion of boys ex- 
ceeds 0.55, we will be dealing with large deviation probabilities when to estimate 
P(S;, > 5500) one would rather use the approximate relation obtained in Sect. 1.3 
by virtue of which (k = 0.45n, q = 0.488) one has 
(n+1—k)q 


P(S; > 5500) * @fpe-r = 5500). 


Applying Theorem 5.2.1 we find that 


0.554 1 —nH055) — 1-25 -11 
P(S, > 5500) pee 2g io 
a q — 0.45 ./27n0.25 5 


Thus if we assume for a moment that 100 million babies are born on this planet 
each year and group them into batches of 10 thousand, then, to observe a group in 
which the proportion of boys exceeds the mean value by just 3.8 % we will have to 
wait, on average, 10 million years (see Example 4.1.1 in Sect. 4.1). 

It is clear that the normal approximation can be used for numerical evaluation of 
probabilities for the problems from Example 5.4.3 provided that the values of np 
are large. 


5.5 Inequalities for Large Deviation Probabilities in the 
Bernoulli Scheme 


In conclusion of the present chapter we will derive several useful inequalities for the 
Bernoulli scheme. In Sect. 5.2 we introduced the function 


x 1-x 
A(x) =xln—+(—x)In : 
P l—p 


which plays an important role in Theorems 5.2.1 and 5.2.2 on the asymptotic be- 
haviour of the probability P(S,, = k). We also considered there the basic properties 
of this function. 


Theorem 5.5.1 For z > 0, 


P(S, — np > z) <exp{—nH(p+2z/n)}, 


(5.5.1) 
P(S, —np <—z) <exp{—nH(p — z/n)}. 


Moreover, for all p, 
H(p +x) > 2x’, (5.5.2) 


so that each of the probabilities in (5.5.1) does not exceed exp{—2z*/n} for any p. 


126 5 Sequences of Independent Trials with Two Outcomes 


To compare it with assertion (5.2.2) of Theorem 5.2.1, the first inequality from 
Theorem 5.5.1 can be re-written in the form 


o(= >") Ses ny). 


The inequalities (5.5.1) are close, to some extent, to the de Moivre—Laplace theorem 
since, for z= o(n?/3), 


2npq 
The last assertion, together with (5.5.2), can be interpreted as follows: deviating by 
z or more from the mean value np has the maximum probability when p = 1/2. 
If z/./n — oo, then both probabilities in (5.5.1) converge to zero as n > oo for 
they correspond to large deviations of the sum S, from the mean np. As we have 
already said, they are called large deviation probabilities. 


2 
-nt(p+ =) oe +o(1). 


Proof of Theorem 5.5.1 In Corollary 4.7.2 of the previous chapter we established 
the inequality 
P(E > x) <e7**Ee*#. 
Applying it to the sum S, we get 
P(S, >np+z)< e AP t2) Re Sn | 


Since Ee** = [hai Ee**« and the random variables e*** are independent, 


n 
Ee?" = | | Be’ = (pe* +q)" =(1+ p(e* - 1))", 
k=1 
P(S, >np+z) <[(1+ p(e* - le ery", a=Zz/n. 
The expression in brackets is equal to 
Ee *Se—-(P+9)] = peAI—P-@) 4 (1 — pe MPT) | 
Therefore, being the sum of two convex functions, it is a convex function of 2. The 
equation for the minimum point A(@) of the function has the form 
-—(p- a)(1 + p(e* - 1)) + pe* =0, 
from which we find that 
oro) — (p+a)q 
p(q-a)’ 
i 
(1+ p(2 —1))e*@ota — _7 = = 2)" “ 
q—a|(p+a)g 
alg: asl 


~ (p-$ayPt@(q —a)i-4 


=exp| (+a) in P28 (q aint 
p q 


= exp{—H(p+a)}. 


5.5 Inequalities for Large Deviation Probabilities in the Bernoulli Scheme 127 


The first of the inequalities (5.5.1) is proved. The second inequality follows from 
the first if we consider the latter as the inequality for the number of zeros. 

It follows further from (5.2.1) that H(p) = H’(p) =0 and A” (x) = 1/x(1 —x). 
Since the function x(1 — x) attains its maximum value on the interval [0, 1] at the 
point x = 1/2, one has H’’(x) > 4 and hence 


H 4a? 
(p+a)> > -4= 20", 


For analogues of Theorem 5.5.1 for sums of arbitrary random variables, see 
Chap. 9 and Appendix 8. Example 9.1.2 shows that the function H(q@) is the so- 
called deviation function for the Bernoulli scheme. This function is important in 
describing large deviation probabilities. 


Chapter 6 
On Convergence of Random Variables 
and Distributions 


Abstract In this chapter, several different types of convergence used in Probability 
Theory are defined and relationships between them are elucidated. Section 6.1 deals 
with convergence in probability and convergence with probability one (the almost 
sure convergence), presenting some criteria for them and, in particular, discussing 
the concept of Cauchy sequences (in probability and almost surely). Then the conti- 
nuity theorem is established (convergence of functions of random variables) and the 
concept of uniform integrability is introduced and discussed, together with its con- 
sequences (in particular, for convergence in mean of suitable orders). Section 6.2 
contains an extensive discussion of weak convergence of distributions. The chap- 
ter ends with Sect. 6.3 presenting criteria for weak convergence of distributions, 
including the concept of distribution determining classes of functions and that of 
tightness. 


6.1 Convergence of Random Variables 
In previous chapters we have already encountered several assertions which dealt 
with convergence, in some sense, of the distributions of random variables or of the 


random variables themselves. Now we will give definitions of different types of 
convergence and elucidate the relationships between them. 


6.1.1 Types of Convergence 


Let a sequence of random variables {&,} and a random variable € be given on a prob- 
ability space (92, §, P). 


Definition 6.1.1 The sequence {é,,} converges in probability! to & if, for any e > 0, 


P(\é& —€|>€) > 0 asn—> oo. 


'In the set-theoretic terminology, convergence in probability means convergence in measure. 


A.A. Borovkov, Probability Theory, Universitext, 129 
DOI 10.1007/978-1-4471-5201-9_6, © Springer-Verlag London 2013 


130 6 On Convergence of Random Variables and Distributions 


One writes this as 
P 
Sn 2é& asn>oo. 


In this notation, the assertion of the law of large numbers for the Bernoulli 
scheme could be written as 

Sh Pp 

—_—-> 


’ 


n 


since S, /n can be considered as a sequence of random variables given on a common 
probability space. 


Definition 6.1.2 We will say that the sequence &, converges to — with probability 1 
(or almost surely: &, — & a.s., En ales &), if &,(w) > E(w) asn > o& forall we 2 
except for w from a set N C £2 of null probability: P(V) = 0. This convergence can 
also be called convergence almost everywhere (a.e.) with respect to the measure P. 


Convergence &, = & implies convergence &, x &. Indeed, if we assume that 
the convergence in probability does not take place then there exist e > 0, 6 > 0, 
and a sequence nx such that, for the sequence of events Ax = {|&, — &| > ¢}, 
we have P(Ax) > 6 for all k. Let B consist of all elementary events belonging to 
infinitely many Ax, i.e. B=(\-_, UR, Ak- Then, clearly for @ € B, the con- 
vergence &,(w) > &(@) is impossible. But B = (\P-_, Bm, where Bm = Ups Ak 
are decreasing events (Bn+1 C Bm), P(Bm) = P(An,,) = 5 and, by the continuity 
axiom, P(B,,) — P(B) as m — oo. Therefore P(B) > 6 and a.s. convergence is 
impossible. The obtained contradiction proves the desired statement. 


The converse assertion, that convergence in probability implies a.s. convergence, 
is, generally speaking, not true, as we will see below. However in one important 
special case such a converse holds true. 


Theorem 6.1.1 Jf & is monotonically increasing or decreasing then convergence 


En Es € implies that €y haus é, 


Proof Assume, without loss of generality, that § = 0, &, > 0, &, | and &, Es é. If 


convergence &, ic € did not hold, there would exist an ¢ > O and a set A with 
P(A) > 6 > 0 such that sup,.,, & > ¢ for w € A and all n. But sup;z.,, && = &) and 
hence we have 


PE, >6€) > P(A) >56>0 


for all n, which contradicts the assumed convergence &, x 0. 


Thus convergence in probability is determined by the behaviour of the numerical 
sequence P(|&, — &| > €). Is it possible to characterise convergence with probabil- 
ity 1 in a similar way? Set ¢, := supys, [En — &|- 


6.1 Convergence of Random Variables 131 


Corollary 6.1.1 é, at € if and only if Cy EA 0, or, which is the same, when, for 
any € > 0, 


P( sup |& — £1 > 2) > 0 as n—> 00. (6.1.1) 
k>n 


Proof Clearly &, — & a.s. if and only if ¢, — 0 a.s. But the sequence ¢, decreases 
monotonically and it remains to make use of Theorem 6.1.1, which implies that 


Cn +. 0 if and only if ¢, <*; 0. The corollary is proved. 


In the above argument, the random variables &, and € could be improper, where 
the random variables &, and € are only defined on a set B and P(B) € (0, 1). (These 
random variables can take infinite values on {2 \ B.) In this case, all the considera- 
tions concerning convergence are carried out on the set B C (2 only. 

In the introduced terminology, the assertion of the strong law of large numbers 
for the Bernoulli scheme (Theorem 5.1.2) can be stated, by virtue of (6.1.1), as 
convergence S;,/n — p with probability 1. 

We have already noted that convergence almost surely implies convergence in 
probability. Now we will give an example showing that the converse assertion is, 
generally speaking, not true. Let (2, , P) be the unit circle with the o-algebra of 
Borel sets and uniform distribution. Put €(@) = 1, &,(@) = 2 on the are [r(n), r(n) + 
1/n] and &,(@) = 1| outside the arc. Here r(n) = Sei i It is obvious that &, 4 é, 
At the same time, r(m) — co as n > ov, and the set on which &, converges to & is 
empty (we can find no w for which &,(@) > &(@)). 

However, if P(|&, — &| > ¢) decreases as n > o© sufficiently fast, then conver- 
gence in probability will also become a.s. convergence. In particular, relation (6.1.1) 
gives the following sufficient condition for convergence with probability 1. 


Theorem 6.1.2 [f the series °°, P(\&n — &| > €) converges for any € > 0, then 
&, > Eas. 


Proof This assertion is obvious, for 


P( Uti —&| >«}) < YPC —&|>e). 


k>n 


It is this criterion that has actually been used in proving the strong law of large 
numbers for the Bernoulli scheme. 

One cannot deduce a converse assertion about the convergence rate to zero of 
the probability P(|&, — &| > ¢) from the a.s. convergence. The reader can easily 
construct an example where &, — & a.s., while P(|&, — €| > €) converges to zero 
arbitrarily slowly. 

Theorem 6.1.2 implies the following result. 


Corollary 6.1.2 If, 4 &, then there exists a subsequence {nx} such that &n, > & 
a.s.ask—> ow. 


132 6 On Convergence of Random Variables and Distributions 


Proof This assertion is also obvious since it suffices to take nz such that 
P(lén, —El>6) < 1/k? and then make use of Theorem 6.1.2. 


There is one more important special case where convergence in probability 


Ey Es € implies convergence &, — & a.s. This is the case when the &, are sums 
of independent random variables. Namely, the following assertion is true. If &) = 
te Nk Mk are independent, then convergence of & in probability implies conver- 
gence with probability 1. This assertion will be proved in Sect. 11.2. 

Finally we consider a third type of convergence of random variables. 


Definition 6.1.3 We will say that &, converges to & in the r-th order mean (in mean 
if r = 1; in mean square if r = 2) if, asn > ~, 


Elg, — &|" > 0. 


This convergence will be denoted by &,, a28 é. 


Clearly, by Chebyshev’s inequality &, aes € implies that &, Es €. On the other 


hand, convergence a does not follow from a.s. convergence (and all the more 
from convergence in probability). Thus convergence in probability is the weakest of 
the three types of convergence we have introduced. 


Note that, under additional conditions, convergence &,, Ss € can imply that 


Ey aR € (see Theorem 6.1.7 below). For example, it will be shown in Corol- 


lary 6.1.4 that if &, £ & and E|é,|'*” <c for some a > 0, c < o6 and all n, then 
(r) 
En —> E. 


Definition 6.1.4 A sequence &, is said to be a Cauchy sequence in probability (a.s., 
in mean) if, for any ¢ > 0, 


P(E: = Em| > é) > 0 
(P( sup én — &ml > &) > 0, Elén — nl" + 0) 
n>m 
asn— ooandm— oo. 


Theorem 6.1.3 (Cauchy convergence test) &, — & in one of the senses #. “S or 


eas if and only if En is a Cauchy sequence in the respective sense. 


Proof That &, is a Cauchy sequence follows from convergence by virtue of the 
inequalities 


lEn — &m| < lEn —é| =F lEm —&|, 
sup lEn — En| < sup l§n —é| + lEm —é| s 2 sup lEn —éI, 
n>=m n>m 


lin — Eml” < Cr(l&n — El" + lm — EI’) 


for some C;. 


6.1 Convergence of Random Variables 133 


Now assume that &, is a Cauchy sequence in probability. Choose a sequence {nx} 
such that 


P(\En — Em| > 2-*) < 2-* 


forn >nz,m>ng. Put 
CO 
Bes iy, Arc {Ie -—Fel>2 “J, n= DI CAd). 
k=1 


Then P(A;) < 2~* and En = yar P(A;) < 1. This means, of course, that the 
number of occurrences of the events Ax is a proper random variable: P(n < oo) = 1, 
and hence with probability | finitely many events A, occur. This means that, for any 
w for which n(w) < 00, there exists a ko(w) such that |’;(w) — &/441(@)| < 2~* 
for all k > kg(w). Therefore one has the inequality |&’,(@) — €’;(@)| < 2-*+! for all 
k > ko(@) and 1 > ko(w), which means that &/ (@) is a numerical Cauchy sequence 
and hence there exists a value &(@) such that |&’,(@) — €(@)| — 0 as k > oo. This 


. as. 
means, in turn, that &’, —+ & and hence 


P(\& — &| 28) <P((e, ~ Ey | > 5) +P, =£| 5) +0 


asn— oo andk —> oo. 

Now assume that &, is a Cauchy sequence in mean. Then, by Chebyshev’s in- 
equality, it will be a Cauchy sequence in probability and hence, by Corollary 6.1.2, 
there will exist a random variable € and a subsequence {nx} such that &,, as é, 
Now we will show that E|é, — &|" — 0. For a given ¢ > 0, choose an n such that 
E|& — &|" <e fork >n and/ >n. Then, by Fatou’s lemma (see Appendix 3), 


Elé, —€|" =E lim |& — En, |” 
nk—>0o 
= Eliminf|é, — &),,|" < liminfE|&, — &, |" <e. 
Ng—> OO Ng—> OO 


This means that E|é, — |" > 0 asn— oo. 

It remains to verify the assertion of the theorem related to a.s. convergence. We 
already know that if &,, is a Cauchy sequence in probability (or a.s.) then there exist a 
€ and a subsequence &,, such that &), Basia €. Therefore, if we put nxn) := min{nz : 
ny > n}, then 


P( sup |é — £] = ©) < P( sup lé — Enjyp| = €/2) + P(lEmy) — 1 = 6/2) > 0 
k>n 


k>n 


as n — oo. The theorem is proved. 


Remark 6.1.1 If we introduce the space L, of all random variables € on (2, §, P) 
for which E|é|" < oo and the norm ||é|| = (E|é|")!/” on it (the triangle inequal- 
ity ||&; + &|| < l]€1|] + ||&2|| is then nothing else but Minkowski’s inequality, see 


Theorem 4.7.2), then the assertion of Theorem 6.1.3 on convergence man (which 


134 6 On Convergence of Random Variables and Distributions 


is convergence in the norm of L,, for we identify random variables & and & if 
|: — &2|| = 0) means that L, is complete and hence is a Banach space. 

The space of all random variables on (2, §, P) can be metrised so that conver- 
gence in the metric will be equivalent to convergence in probability. For instance, 
one could put 


l§1 — &2| 
pi, 2) =e +o) 
Since 

Ix + y| i |x| ly| 
T+|xt+ty|~ 14+ |x| 14+|yI 


always holds, o(&1, &2) satisfies all the axioms of a metric. It is not difficult to see 


that relations p(&,&) — O and &, +. 0 are equivalent. The assertion of Theo- 


rem 6.1.3 related to convergence 5 means that the metric space we introduced 
is complete. 


6.1.2 The Continuity Theorem 
Now we will derive the following “continuity theorem”. 


Theorem 6.1.4 Let &, = E (& = &) and H(s) be a function continuous every- 
where with respect to the distribution of the random variable & (i.e. H(s) is contin- 
uous at each point of a set S such that P(E € S) = 1). Then 


H(En) “> HE) (Hn) > H@)). 


Proof Let &, “> &. Since the sets A = {w: &,(w) > €(w)} and B = {w: E(w) € S} 
are both of probability 1, P(AB) = P(A) + P(B) — P(A U B) = 1. But one has 
H(é,) — H(&) on the set AB. Convergence with probability 1 is proved. 


Now let &, & €. If we assume that convergence H (&,) 3H (€) does not take 
place then there will exist ¢ > 0, 6 > 0 and a subsequence {n’} such that 


P(|H (En) — H(é)| >) > 6. 


But &,/ x € and hence there exists a subsequence {n"’} such that &,” = é€ and 


A(Ey”) Base) = | (€). This contradicts the assumption we made, for the latter implies 
that 


P(|H(,”) — H(&)| >) > 8. 


The theorem is proved. 


6.1.3 Uniform Integrability and Its Consequences 


Now we will consider this question: in what cases does convergence in probability 
imply convergence in mean? 


6.1 Convergence of Random Variables 135 


The main condition that ensures the transition from convergence in probability 
to convergence in mean is associated with the notion of uniform integrability. 
Definition 6.1.5 A sequence {é,,} is said to be uniformly integrable if 

sup E(|En|; lEn| > N) >-0 aN>ow. 
n 


A sequence of independent identically distributed random variables with finite 
mean is, clearly, uniformly integrable. 

If {&,} is uniformly integrable then so are {c&,} and {&, +c}, where c = const. 

Let us present some further, less evident, properties of uniform integrability. 


U1. If the sequences {&/} and {&/'} are uniformly integrable then the sequences 
defined by €) = max(|&}|, |&/’|) and f) = &} + &” are also uniformly integrable. 


Proof Indeed, for ¢, = max(&),|, |&;'|) we have 


E(én3 Sn > N)=E (Ens Sn > Ny [En] > [En |) +E(Sni Sn > N, 
<E(|&|: |&| > N) +E (Er |; |&'| = N) > 0 


&)| < |&/'|) 


’ ’ 


as N > oo. 
Since 
lén + &| S [En] + [én] S 2max(|§ |, |&7), 
from the above it follows that the sequence defined by the sum ¢, = &/ + &/’ is also 
uniformly integrable. 


’ 


U2. If {En} is uniformly integrable then sup, E|§;| < c < 00. 


Proof Indeed, choose N so that 
sup E(|én|: lEn| > N) <1. 
n 


Then 


sup Elén| = sup[E(|Enl; [En] < N) + E(lEnls El > N)J SN +1. 


The converse assertion is not true. For example, for a sequence 
En: PEn =n) = 1/n=1— PE, =0) 


one has E|é,| = 1, but the sequence is not uniformly integrable. 
If we somewhat strengthen the above statement U2, it becomes “characteristic” 
for uniform integrability. 


Theorem 6.1.5 For a sequence {&,} to be uniformly integrable, it is necessary and 
sufficient that there exists a function (x) such that 


YO) + 0 as x t 00, sup Ey (|Enl) <¢ < 00. (6.1.2) 


In the necessity assertion one can choose a convex function wp. 


136 6 On Convergence of Random Variables and Distributions 


Proof Without loss of generality we can assume that & > 0. 


The sufficiency is evident, since, putting v(x) := ¥@) 


7» We get 


1 


Cc 
hd 
v(N) ~ v(N) 


To prove the necessity, put 


e(N) := supEn; &n > N). 


Then, by virtue of uniform integrability, e(V) | 0 as N + oo. Choose a sequence 
Nx t co as k t 00 such that 


(oe) 
ye Ve(Nz) <c1 < ~, 
k=1 


and put 


—1/2 
g(x) =x(e(Ng)) forx €[Ne, Net). 
Since 


g(Ne—0) 


= 7 N, 
ae (e(Nx-1)) age (e(Nx)) M2 am) ay 


Nk 


we have a) + oo as x — oo. Further, 


Eg(E,) = ) E[g (En): &: € [Nk Nev] 
k 
= SE [En(e(Ne)) 7s En € [Nes Nev) 
k 


< ¥(e(Ne)) e(N) = Ve) <1, 
k k 


where the right-hand side does not depend on n. Therefore, to prove the theorem it 
is sufficient to construct a function w < g which is convex and such that = t 00 
as x TOO. 

Define the function y (x) as the continuous polygon with nodes (Nz, g(Nx —0)). 
Since 


g(Nx — 9) 


— e(N,_,)7 1/2 
Ni €(Nx-1) 


monotonically increases as k grows, wy is a lower envelope curve for the discontinu- 


ous function g(x) > w(x). The monotonicity of vey follows from the fact that, on 
the interval [Nz, Nz+1), this function can be represented as 


w(x) bx 
x 


= dk, , 
¥ x 


6.1 Convergence of Random Variables 137 


where b; > 0, because the values (Nz; — 0) and g(Nz41 — 0) coincide, while the 
angular incline ax,y, of the function y on the interval [N;, Nx+1) is greater than the 
“radial” incline ax, of the function g: 


Hes g(Ni+1 — 9) — g(Nx) 2 &(Nx41 — 0) — (Nx — 9) = 
" Neti — Ne Neti — Ne 


ky: 


w(x) 


It is clear that - 


increases unboundedly, for 
W(Ne) _ g(Ne- 9) 
Nx Nx 


as k — oo. The theorem is proved. 


=2(Nz-1) '/* +00 


In studying the mean values of sums of random variables, the following theorem 
on uniform integrability of average values, following from Theorem 6.1.5, plays an 
important role. 


Theorem 6.1.6 Let &, &,... be an arbitrary uniformly integrable sequence of ran- 
dom variables, 


n n 
Pi,n = 0, a Pin = I; Cn = +S \&i| Diwn- 
i=l k=1 


Then the sequence {¢,} is uniformly integrable as well. 


Proof Let w(x) be the convex function from Theorem 6.1.5 satisfying proper- 
ties (6.1.2). Then, by that theorem, 


Ew (on) =E¥ (Snell <E)) pinW(l&il) <c. 


i=l i=l 


It remains to make use of Theorem 6.1.5 again. 


Now we will show that convergence in probability together with uniform inte- 
grability imply convergence in mean. 


Theorem 6.1.7 Let é, 4 E and {&,} be uniformly integrable. Then E|é| exists and, 
asn—> ©, 


Elé, — §| > 0. 


Tf, moreover, {\&) |} is uniformly integrable then Ey, 2, é, 


Conversely, if, for an r = 1, En — E and E|é|" < ov, then {|&,|"} is uniformly 


integrable. 


In the law of large numbers for the Bernoulli scheme (see Theorem 5.1.1) we 
proved that the normed sum S,,/n converges to p in probability. Since 0 < S,/n < 1, 


138 6 On Convergence of Random Variables and Distributions 


S,/n is clearly uniformly integrable and the convergence in mean 
E|S,/n — p|’ — 0 holds for any r. This fact can also be established directly. 
For a more substantiative example of application of Theorems 6.1.6 and 6.1.7, see 
Sect. 8.1. 


Proof We show that Eé exists. By the properties of integrals (see Lemma A3.2.3 in 
Appendix 3), if E/¢| < oo then E(¢; A,) > 0 as P(A,) — 0. Since Eé, < 00, for 
any N and ¢ one has 


Emin(|é|, V) = lim [Emin((é], N); | —E| <e] 
< lim Emin(|§|+¢,N) <c+e. 
n—-> oo 


It follows that E|é| <c. 
Further, for brevity, put n, = |&, — €|. Then ny, ESS 0 and 7, are uniformly inte- 


grable together with €,. For any N and ¢, one has 
Enn = E(t; nn S &) + EQ: N = Mn > €) + EC: tn 2 N) 


<e+NP(m = &)+E(m; tn > N). (6.1.3) 
Choose N so that sup, E(7n; Mn > N) < &. Then, for such an N, 


lim sup En, < 2¢. 
noo 
Since ¢€ is arbitrary, En, — 0 as n —> oo. 
The relation E|é,, — &|" — 0 can be proved in the same way as (6.1.3), since 
ni, = |én — él" + 0 and ni, are uniformly integrable together with |é,|" 
Now we will prove the converse assertion. Let, for simplicity, r = 1. One has 


E(lénl; ll > N) SE(lé — &l; lEvl > N) + E(IEI; l&nl > N) 
<Elé, — §| + E(lé|; [nl > N) 
SE\&, — €| + E(lE|; lén — €| > 1) + E(IE|; 1&1 > N — 1). 


The first term on the right-hand side tends to zero by the assumption, and the second 
term, by Lemma A3.2.3 from Appendix 3, which we have just mentioned, and the 
fact that P(|&, — &| > 1) — 0. The last term does not depend on n and can be made 
arbitrarily small by choosing N. Theorem 6.1.7 is proved. 


Now we can derive yet another continuity theorem which has the following form. 
Theorem 6.1.8 [f &, = &, H(s) satisfies the conditions of Theorem 6.1.4, and 
H(&,) is uniformly integrable, then, asn > , 

E|H(E,) — H)| > 0 
and, in particular, EH (é,) > EH (&). 


6.1 Convergence of Random Variables 139 


This assertion follows from Theorems 6.1.4 and 6.1.7, for H(&,) Ci (€) by 
Theorem 6.1.4. 

Sometimes it is convenient to distinguish between /eft and right uniform integra- 
bility. We will say that a sequence {&,} is right (left) uniformly integrable if 


supE(En; && =>N)>O0 = (supE(lEn|; & < —N) > 0) 


as N — oo. It is evident that a sequence {&,} is uniformly integrable if and only if 
it is both right and left uniformly integrable. 


Lemma 6.1.1 A sequence {&,} is right uniform integrable if at least one of the 
following conditions is met: 


1. For any sequence N(n) > & as n— oO, one has 
E(én; &, > N(n)) => 0. 


(This condition is clearly also necessary for uniform integrability.) 
2. &, <n, where En < ow. 
3. E(E*)!*% <c¢ < 00 for some a > 0 (here x* = max(0, x)). 


4. &, is left uniformly integrable, &,, x €, and Ké,, — Eé < oo. 


Proof 


1. If the sequence {&,} were not right uniformly integrable, there would exist 
an ¢ > 0 and subsequences n’ — oo and N’ = N’(n’) > o such that E(é,/; 
E,1 > N’) > €. But this contradicts condition 1. 

2. E(én; & > N) <E(y; n> N) > Oa N-> ww. 

3. E(En; &n > N) <EEMON~; & > N) <N-%“c>0asN> om. 

4. Without loss of generality, put € := 0. Then 


E(én; &n > N) = E&n — E(En; En < —N) — E(én:; l§n| < N). 


The first two terms on the right-hand side vanish as n > oo for any N = 
N(n) —> oo. For the last term, for any ¢ > 0, one has 


|E(Ens l&nl < N)| < |E(En nl < €)| + [E(Ens € < ll < N)| 

<e+ NP(lénl > €). 
For any given ¢ > 0, choose an n(e) such that, for all n > n(e€), we would have 
P(\é,| > €) < €, and put N(e) := [1//e]. This will mean that, for all n > n(e) 
and N < N(e), one has E(&,; |&,| < N) < ¢ +/é, and therefore condition 1 of the 
lemma holds for E(&,; &) > N). The lemma is proved. 


Now, based on the above, we can state three useful corollaries. 


Corollary 6.1.3 (The dominated convergence theorem) [f &, 4 E, |E,| < n, and 
En < oo then Eé exists and Eé, > Eé. 


140 6 On Convergence of Random Variables and Distributions 
Corollary 6.1.4 If é&, a E and E\é,\"t* < c < 0 for some a > 0 then &, 1. é, 


Corollary 6.1.5 If &, 4 &€ and H(x) is a continuous bounded function, then 
E|A(é,) — EH(é)| ~ 0Oasn>ow. 


In conclusion of the present section, we will derive one more auxiliary proposi- 
tion that can be useful. 


Lemma 6.1.2 (On integrals over sets of small probability) [f {&,} is a uniformly in- 
tegrable sequence and {A,,} is an arbitrary sequence of events such that P(A,,) > 0, 
then E(\é,|; An) > 0 asn— oo. 


Proof Put By, := {|&,| < N}. Then 


E(\&n|; An) = E(\€n|; AnBn) + E(|&n|; AnBn) 
< NP(A,) + E(lén|; lEn| > N). 


For a given € > 0, first choose N so that the second summand on the right-hand side 
does not exceed ¢/2 and then an n such that the first summand does not exceed ¢/2. 
We obtain that, by choosing n large enough, we can make E(|&,|; Ay) less than e. 
The lemma is proved. 


6.2 Convergence of Distributions 


In Sect. 6.1 we introduced three types of convergence which can be used to charac- 
terise the closeness of random variables given on a common probability space. But 
what can one do if random variables are given on different probability spaces (or if 
it is not known where they are given) which nevertheless have similar distributions? 
(Recall, for instance, the Poisson or de Moivre—Laplace theorems.) In such cases 
one should be able to characterise the closeness of the distributions themselves. 
Having found an apt definition for such a closeness, in many problems we will be 
able to approximate the required but hard to come by distributions by known and, 
as a rule, simpler distributions. 

Now what distributions should be considered as close? We are clearly looking 
for a definition of convergence of a sequence of distribution functions F(x) to a 
distribution function F(x). It would be natural, for instance, that the distributions 
of the variables &, = € + 1/n should converge to that of € as n — oo. Therefore 
requiring in the definition of convergence that sup, |F,(x) — F(x)| is small would 
be unreasonable since this condition is not satisfied for the distributions of € + 1/n 
and € if F(x) = P(é < x) has at least one point of discontinuity. 

We will define the convergence of F,, to F as that which arises when one consid- 
ers convergence in probability. 


6.2 Convergence of Distributions 141 


Definition 6.2.1 We will say that distribution functions F,, converge weakly to a 
distribution function F as n — oo, and denote this by F,, > F if, for any continuous 
bounded function f(x), 


[fearos [ torarco. (6.2.1) 


Considering the distributions F,,(B) and F(B) (B are Borel sets) corresponding to 
F,, and F, we say that F,, converges weakly to F and write F,, > F. One can clearly 
re-write (6.2.1) as 


[forma [ rooran or Ef(,) > Ef) (6.2.2) 


(cf. Corollary 6.1.5), where &, € F, and é & F. 
Another possible definition of weak convergence follows from the next assertion. 


Theorem 6.2.1 * F,, => F if and only if F,(x) > F(x) at each point of continuity 
x of F. 


Proof Let (6.2.1) hold. Consider an ¢ > 0 and a continuous function f,(t) which is 
equal to | for t < x and to 0 for t > x + ¢, and varies linearly on [x, x + e]. Since 


F(x) = [ Se(t)dFy(t) < / fe(t) d Fn(t), 
by virtue of (6.2.1) one has 


lim sup F;, (x) < i fet) dF(t)< Fate). 


noo 
If x is a point of continuity of F then 
lim sup F, (x) < F(x) 
noo 
since € is arbitrary. 
In the same way, using the function f(t) = f,(t + €), we obtain the inequality 
liminf F,(x) > F(x). 
n— oo 
We now prove the converse assertion. Let —M and N be points of continuity 
of F such that F(—M) < e/5 and 1 — F(N) < €/5. Then F,(—M) < e/4 and 


1— F,(N) < ¢€/4 for all sufficiently large n. Therefore, assuming for simplicity that 
| f| < 1, we obtain that 


[far and [rar (6.2.3) 


In many texts on probability theory the condition of the theorem is given as the definition of weak 
convergence. However, the definition in terms of the relation (6.2.2) is apparently more appropriate 
for it continues to remain valid for distributions on arbitrary topological spaces (see, e.g. [1, 25]). 


142 6 On Convergence of Random Variables and Distributions 


N N 
i fdF, and / f dF, 
—M —M 


respectively, by less than ¢/2. Construct on the semi-interval (—M, N] a step func- 
tion f, with jumps at the points of continuity of F which differs from f by less than 
é/2. Outside (—M, N] we set f, := 0. We can put, for instance, 


will differ from 


k 
fe(x) = D> fxj)8j(@). 


j=l 


where x9 = —M <x) <--: <x, = WN are appropriately chosen points of continu- 
ity of F’, and 6; (x) is the indicator function of the semi-interval (x;—-1, x;]. Then 
i fe dF, and f f-dF will differ from the respective integrals in (6.2.3), for suffi- 
ciently large n, by less than e. At the same time, 


k 
[fam =D repre) mew] f far. 


j=l 
Since ¢ > 0 is arbitrary, the last relation implies (6.2.1). (Indeed, one just has to 
make use of the inequality 


limsup [ far, <e+limsup [ f.dF,=e+ f fdr s2e+ f far 


and a similar inequality for liminf ff dF,,.) The theorem is proved. 


For remarks on different and, in a certain sense, simpler proofs of the second 
assertion of Theorem 6.2.1, see the end of Sect. 6.3 and Sect. 7.4. 


Remark 6.2.1 Repeating with obvious modifications the above-presented proof, we 
can get a somewhat different equivalent of convergence (4): convergence of differ- 
ences Fy (y) — Fy(x) > F(y) — F(x) for any points of continuity x and y of F. 


Remark 6.2.2 If F(x) is continuous then convergence F,, => F is equivalent to the 
uniform convergence sup, | Fi, (x) — F(x)| > 0. 


We leave the proof of the last assertion to the reader. It follows from the fact 
that convergence F,(x) > F(x) at any x implies, by virtue of the continuity of F, 
uniform convergence on any finite interval. The uniform smallness of F, (x) — F(x) 
on the “tails” is ensured by the smallness of F(x) and 1 — F(x). 


Remark 6.2.3 If distributions F,, and F are discrete and have jumps at the same 
points x;,x2,... then F, => F will clearly be equivalent to the convergence of the 
probabilities of the values x1, x2,... (Fn (ve +0) — Fr(xn) > F(x, +0) — F(xx)). 


6.2 Convergence of Distributions 143 


We introduce some notation which will be convenient for the sequel. Let &, and 
€ be some random variables (given, generally speaking, on different probability 
spaces) such that & €F, andé €&F. 


Definition 6.2.2 If F,, = F we will say that &, converges to & in distribution and 
write &, = &. 


We used here the same symbol = as for the weak convergence, but this leads to 
no confusion. 

It is clear that &,, Es € implies &,, = &, but not vice versa. 

At the same time the following assertion holds true. 


Lemma 6.2.1 [f &, = & (F, = F) then one can construct random variables &/, 
and &' on a common probability space so that P(E}, < x) = P(E, < x) = Fy(x), 
P(é’ <x) =P(E < x) = F(x), and 


g as. g 
Ss 


Proof Define the quantile transforms (see Definition 3.2.6) by 
F\(t) — sup{x > Fy (x) < ths F7\(t) = sup{x F(x) < th. 


(If F(x) is continuous and strictly increasing then F —!(t) coincides with the solu- 
tion to the equation F(v) = t.) Let n € Uo,1. Put 


f= Wen, SSF Her 


(cf. Theorem 3.2.2), and show that &/ = &’. In order to do that, it suffices to prove 
that F-!(y) > F7!(y) for almost all y € [0, 1]. 

The functions F and F~! are monotone and hence each of them has at most 
a countable set of discontinuity points. This means that, for all y € [0, 1] with the 
possible exclusion of the points from a countable set 7, the function F~!(y) will 
be continuous. 

So let y be a point of continuity of F~), and FT) (y) = x. 

For t < y, choose a continuous strictly increasing function GW) (t) such that 


GOgy)=FO GY), GOMH< FOX fort<y. 


Denote by G(v), v < x, the function inverse to GD). Clearly, G(v) domi- 
nates the function F(v) in the domain v < x. By virtue of the continuity and strict 
monotonicity of the functions G‘—) and G (in the domain under consideration), for 
€ > 0 we have 


G(x —e)=y—<d(e), 
where d(¢€) > 0, d(€) > 0 as ¢ — 0. Choose an ¢ such that x — ¢ is a point of 
continuity of F'. Then, for all large enough, 


F(x —€) < F(x tO <Ge yt ay _ 


144 6 On Convergence of Random Variables and Distributions 


The opposite inequality can be proved in a similar way. Since ¢ can be arbitrarily 
small, we obtain that, for almost all y, 


F~'(y) > FY) asn—> ©. 


Hence BOD (n) > F -D(n) with probability 1 with respect to the distribution of 7. 
The lemma is proved. 


Lemma 6.2.1 remains true for vector-valued random variables as well. 
Sometimes it is also convenient to have a simple symbol for the relation “the 
distribution of &, converges weakly to F”. We will write this relation as 


En & F, (6.2.4) 


so that the symbol Gexpresses the same fact as => but relates objects of a different 
nature in the same way as the symbol € in the relation € & P (on the left-hand 
side in (6.2.4) we have random variables, while on the right hand side there is a 
distribution). 

In these terms, the assertion of the Poisson theorem could be written as S, & I, 
while the statement of the law of large numbers for the Bernoulli scheme takes the 
form S,/n & Ip. 


The coincidence of the distributions of € and 7 will be denoted by & & n. 


Lemma 6.2.2 If &) => € and &y 0 then &) + €) > &. 
Tf En => E and Yn, = 1 then EnYn => &. 


Proof Let us prove the first assertion. For any t and 6 > O such that ¢ and ¢ + 6 are 
points of continuity of P(é < t), one has 


lim sup P(é, + €, < t) = limsup P(E, + €n <t, &, > —4) 
n->o n->co 


< limsup P(é, <t+6)=P(é <t+6). 
noo 


Similarly, 


liminf P(é, + én < t) > P(E <t— 5). 
n> Oo 


Since P(€ < t+ 4) can be chosen arbitrary close to P(& < t) by taking a sufficiently 
small 5, the required convergence follows. 
The second assertion can be proved in the same way. The lemma is proved. 


Now we will give analogues of Theorems 6.1.4 and 6.1.7 in terms of distribu- 
tions. 


Theorem 6.2.2 If &; = & and a function H(s) satisfies the conditions of Theo- 
rem 6.1.4 then H(é,) > H(é). 


6.2 Convergence of Distributions 145 


Theorem 6.2.3 If &, = & and the sequence {&,} is uniformly integrable then Eé 
exists and K&, — Ké. 


Proof There are two ways of proving these theorems. One of them consists of re- 
ducing them to Theorems 6.1.4 and 6.1.7. To this end, one has to construct random 
variables &’,, = FX? (n) and &’ = F‘~(n), where n € Uo,1 and BO? and FOY 
are the quantile transforms of F,, and F, respectively, and prove that &’,, £ E’ (we 
already know that F‘—!)(n) & F; if F is discontinuous or not strictly increasing, 
then F‘—) should be defined as in Lemma 6.2.1). 

Another approach is to prove the theorems anew using the language of distri- 
butions. Under inessential additional assumptions, such proofs are sometimes even 
simpler. To illustrate this, assume, for instance, in Theorem 6.2.3 that the function 
His continuous. One has to prove that Eg(H (&,)) — Eg(A(é)) for any continuous 
bounded function g. But this is an immediate consequence of (6.2.1) and (6.2.2), for 
f =g04H (f is the composition of the functions g and #7). 

In Theorem 6.2.3 assume that &, > 0 (this does not restrict the generality). Then, 
integrating by parts, we get 


Eg, =— i. ” ydP(é, > x) = / PG = x)dx. (6.2.5) 
0 0 


Since by virtue of uniform integrability 
[o,@) 
sup | P(En = x) dx < supE(En; &n = N) > 0 
n N n 


as N — oo, the integral in (6.2.5) is uniformly convergent. Moreover, P(é, > x) > 
P(é > x) a.s., and therefore 


[o.@) CO 
lim Eé, = lim i P(E, > x)dx =i P(E > x)dx = EE. 
n—->oo n—oco 0 0 


Conditions ensuring uniform integrability are contained in Lemma 6.1.1. Now 
we will give a modification of assertion 4 of this lemma for the case of weak con- 
vergence. 


Lemma 6.2.3 If {&,} is left uniformly integrable, &) = & and EE, — Eé then {&,} 
is uniformly integrable. 


We suggest to the reader to construct examples showing that all three conditions 
of the lemma are essential. 

Lemma 6.2.3 implies, in particular, that if €&, > 0, & = & and E&, — Eé then 
{En} is uniformly integrable. 

As for Theorems 6.2.2 and 6.2.3, two alternative ways to prove the result are 
possible here. One of them consists of using Lemma 6.1.1. We will present here a 
different, somewhat simpler, proof. 


146 6 On Convergence of Random Variables and Distributions 


Proof of Lemma 6.2.3 For simplicity assume that &, > 0. Suppose that the lemma 
is not valid. Then there exist an ¢ > 0 and subsequences n’ — oo and N(n’) > co 
such that 


E(Ey3 &1' > N(n’)) >. 
Since 
Eéy = En; Ev < N) + Ens Ev > N), 
for any WN that is a point of continuity of the distribution of €, one has 
Eg = lim &v 2 EG; § <N) +e. 


Choose an N such that the first summand on the right-hand side exceeds E& — ¢/2. 
Then we obtain the contradiction E§ > E& + ¢/2, which proves the lemma. 

We leave it to the reader to extend the proof to the case of arbitrary left uniformly 
integrable {&,}. 


The following theorem can also be useful. 


Theorem 6.2.4 Suppose that —&, = &, H(s) is differentiable at a point a, and 
by, > Oasn— oc. Then 


1 
5 (H(a + buf) — H(a)) = §H'(a). 


If H'(a) =0 and H" (a) exists then 


£2 


Bere + bn&n) — H(a)) > 5H"). 


2 
b;, 


Proof Consider the function 
H(a+x)—H(a) : 
—— ~ifx 40, 


hey =| 
H'(a) if x =0, 


which is continuous at the point x = 0. Since b,&, = 0, by Theorem 6.2.2 one has 
h(bnén) > h(0O) = H’(a). Using the theorem again (this time for two-dimensional 
distributions), we get 


H (a+ bnén) — H(a) 


; =h(bnEn)En > H'(aé. 


The second assertion is proved in the same way. 


A multivariate analogue of this theorem will look somewhat more complicated. 
The reader could obtain it himself, following the lines of the argument proving The- 
orem 6.2.4. 


6.3 Conditions for Weak Convergence 147 


6.3 Conditions for Weak Convergence 


Now we will return to the concept of weak convergence. We have two criteria for this 
convergence: relation (6.2.1) and Theorem 6.2.1. However, from the point of view 
of their possible applications (their verification in concrete problems) both these 
criteria are inconvenient. For instance, proving, say, convergence Ef (&,) > Ef (&) 
not for all continuous bounded functions f but just for elements f of a certain rather 
narrow class of functions that has a simple and clear nature would be much easier. 
It is obvious, however, that such a class cannot be very narrow. 

Before stating the basic assertions, we will introduce a few concepts. 

Extend the class ¥ of all distribution functions to the class § of all functions 
G satisfying conditions Fl and F2 from Sect. 3.2 and conditions G(—oo) > 0, 
G(oo) < 1. Functions G from G could be called generalised distribution functions. 
One can think of them as distribution functions of improper random variables as- 
suming infinite values with positive probabilities, so that G(—oo) = P(§ = —oo) 
and 1 — G(co) = P(E = 00). We will write G, > G for G, € G and Ge G if 
Gn(x) > G(x) at all points of continuity of G(x). 


Theorem 6.3.1 (Helly) The class S is compact with respect to convergence >, 
i.e. from any sequence {Gn}, Gn € G, one can choose a convergent subsequence 


Gn, > GEG. 
For the proof of Theorem 6.3.1, see Appendix 4. 


Corollary 6.3.1 /[f each convergent subsequence {Gn,} of {Gn} with Gn € G con- 
verges to G thenG, > G. 


Proof IG, * G then there exists a point of continuity xo of G such that G;, (x9) & 
G(xo). Since Gy(xo) € [0, 1], there exists a convergent subsequence Gy, such 
that Gn, (xo) > g # G(xo). This, however, is impossible by our assumption, for 
Gn, (x0) > G(x0). 


The reason for extending the class F of all distribution functions is that it is not 
compact (in the sense of Theorem 6.3.1) and convergence F, = G, F, € F, does 
not imply that G € F. For example, the sequence 


0 if x < —n, 
Fri(xv)= 41/2 if —n<x<n, (6.3.1) 
1 ifx>n 


converges everywhere to the function G(x) = 1/2 ¢ F corresponding to an improper 
random variable taking the values too with probabilities 1/2. 

However, dealing with the class S is also not very convenient. The fact is that 
convergence at points of continuity G, = G in the class G is not equivalent to 


convergence 
/ faG = / fdG 


148 6 On Convergence of Random Variables and Distributions 


(see example (6.3.1) for f = 1), and the integrals { fdG do not specify G uniquely 
(they specify the increments of G, but not the values G(—oo) and G(oo)). Now we 
will introduce two concepts that will help to avoid the above-mentioned inconve- 
nience. 


Definition 6.3.1 A sequence of distributions {F,,} (or distribution functions {F;,}) 
is said to be tight if, for any ¢ > 0, there exists an N such that 


infF,([-N, N]) > 1c. (6.3.2) 


Definition 6.3.2 A class £ of continuous bounded functions is said to be distribu- 
tion determining if the equality 


[ remare= | roracen, Fes, Ges, 


for all f € £ implies that F = G (or, which is the same, if the relation Ef (&) = 
Ef (7) for all f ¢ £, where one of the random variables € and 7 is proper, implies 


that € £ 7). 
The next theorem is the main result of the present section. 


Theorem 6.3.2 Let £ be a distribution determining class and {Fy} a sequence of 
distributions. For the existence of a distribution F € F such that Fy, => F it is nec- 
essary and sufficient that? 


(1) the sequence {F,} is tight; and 
(2) limn+oo f fd Fn exists for all f €X. 


Proof The necessary part is obvious. 

Sufficiency. By Theorem 6.3.1 there exists a subsequence F,, => F € S. But 
by condition (1) one has F e€ F. Indeed, if x > N is a point of continuity of F 
then, by Definition 6.3.1, F(x) = lim Fy, (x) => 1 — e. Ina similar way we establish 
that for x < —N one has F(x) < «. Since ¢ is arbitrary, we have F(—co) = 0 and 
F(ow)=1. 

Further, take another convergent subsequence Fit => Ge §. Then, for any 
f €&, one has 


tim f fF = | far, lim f fay = | fac. (6.3.3) 
But, by condition (2), 


[rar f rac. (6.3.4) 
and hence F = G. The theorem is proved by virtue of Corollary 6.3.1. 


3In this form the theorem persists for spaces of a more general nature. The role of the segments 
[-N, N] in (6.3.2) is played in that case by compact sets (cf. [1, 14, 25, 31]). 


6.3 Conditions for Weak Convergence 149 


Fig. 6.1 The plot of the 
function fa,¢(x) from 
Example 6.3.1 


I 
i 
i 
I 
I 
: 
1 
0 a ate 


If one needs to prove convergence to a “known” distribution F' € F, the tightness 
condition in Theorem 6.3.2 becomes redundant. 


Corollary 6.3.2 Let £ be a distribution determining class and 


[rans f rar. FeS, (6.3.5) 


for any f € &. Moreover, assume that at least one of the following three conditions 
is met: 


(1) the sequence {F)} is tight; 
(2) FeS; 
(3) f=leEZ Ge. (6.3.5) holds for f = 1). 


Then F €F and Fy, > F. 


The proof of the corollary is almost next to obvious. Under condition (1) the as- 
sertion follows immediately from Theorem 6.3.2. Condition (3) and convergence 
(6.3.5) imply condition (2). If (2) holds, then F' € F in relations (6.3.3) and (6.3.4), 
and therefore G = F. 


Since, as a rule, at least one of conditions (1)—(3) is satisfied (as we will see 
below), the basic task is to verify convergence (6.3.5) for the class £. 

Note also that, in the case where one proves convergence to a distribution F € F 
“known” in advance, the whole arrangement of the argument can be different and 
simpler. One such alternative approach is presented in Sect. 7.4. 

Now we will give several examples of distributions determining classes L. 


Example 6.3.1 The class £o of functions having the form 
1 
Fae) = ie fae vee, 
On the segment [a, a+ €] the functions f,,. are defined to be linear and continuous 


(a plot of fa,-(x) is given in Fig. 6.1). It is a two-parameter family of functions. 
We show that Lo is a distribution determining class. Let 


[rar=[ rac 
for all f € Lo. Then 


F(a) < / fairs / f2de 26048, 


if x <a, 


150 6 On Convergence of Random Variables and Distributions 


and, conversely, 
G(a)< F(a+e) 
for any ¢ > 0. Taking a to be a point of continuity of both F and G, we obtain that 
F(a) =G(a). 


Since this is valid for all points of continuity, we get F = G. 


One can easily verify in a similar way that the class Li of “trapezium-shaped” 
functions f(x) = min(fa,c, | — fb,<e), a < b, is also distribution determining. 


Example 6.3.2 The class £; of continuous bounded functions such that, for each 
f €4o0 (or fe La), there exists a sequence f, € £1, sup, |f(x)| <M < o, for 
which limps oo fn(x) = f(x) for each x € R. 


Let 
[rar=| rac 


for all f € £;. By the dominated convergence theorem, 


lim f fodP = f far, lim f f.aG= | rac, fely. 


Therefore 
frar=frac, eto, F=8 


and hence £; is a distribution determining class. 


Example 6.3.3 The class Cy of all bounded functions f(x) having bounded uni- 
formly continuous k-th derivatives f (x) (sup, | f ®(x)| <oo)k>1. 

It is evident that C; is a distribution determining class for it is a special case of 
an £ class. 

In the same way one can see that the subclass Cz ° © C of functions having fi- 
nite support (vanishing outside a finite interval) is alk distribution determining. 
This follows from the fact that CP is an £;-class with respect to the class ie of 
trapezium-shaped (and therefore having compact support) functions. 

It is clear that the class Cx satisfies condition (3) from Corollary 6.3.2 (f = 
1 € Cx). Therefore, to prove convergence F;, = F € F it suffices to verify conver- 
gence (6.3.5) for f € Cx only. 

If one takes £ to be the class CG of differentiable functions with finite sup- 
port then relation (6.3.5) together with condition (2) of Corollary 6.3.2 could be 
re-written as 


[rosaxs [reas Feg. (6.3.6) 


(One has to integrate (6.3.5) by parts and use the fact that f’ also has a finite sup- 
port.) The convergence criterion (6.3.6) is sometimes useful. It can be used to show, 


6.3 Conditions for Weak Convergence 151 


for example, that (6.3.5) follows from convergence F,,(x) — F(x) at all points of 
continuity of F (i.e. almost everywhere), since that convergence and the dominated 
convergence theorem imply (6.3.6) which is equivalent to (6.3.5). 


Example 6.3.4 One of the most important distribution determining classes is the 
one-parameter family of complex-valued functions {e'*}, t € R. 


The next chapter will be devoted to studying the properties of f ed F(x). 
After obvious changes, all the material in the present chapter can be extended to 
the multivariate case. 


Chapter 7 
Characteristic Functions 


Abstract Section 7.1 begins with formal definitions and contains an extensive dis- 
cussion of the basic properties of characteristic functions, including those related to 
the nature of the underlying distributions. Section 7.2 presents the proofs of the in- 
version formulas for both densities and distribution functions, and also in the space 
of square integrable functions. Then the fundamental continuity theorem relating 
pointwise convergence of characteristic functions to weak convergence of the re- 
spective distributions is proved in Sect. 7.3. The result is illustrated by proving the 
Poisson theorem, with a bound for the convergence rate, in Sect. 7.4. After that, 
the previously presented theory is extended in Sect. 7.5 to the multivariate case. 
Some applications of characteristic functions are discussed in Sect. 7.6, including 
the stability properties of the normal and Cauchy distributions and an in-depth dis- 
cussion of the gamma distribution and its properties. Section 7.7 introduces the 
concept of generating functions and uses it to analyse the asymptotic behaviour 
of a simple Markov discrete time branching process. The obtained results include 
the formula for the eventual extinction probability, the asymptotic behaviour of the 
non-extinction probabilities in the critical case, and convergence in that case of the 
conditional distributions of the scaled population size given non-extinction to the 
exponential law. 


7.1 Definition and Properties of Characteristic Functions 


As a preliminary remark, note that together with real-valued random variables &(@) 
we could also consider complex-valued random variables, by which we mean func- 
tions of the form &|(w) + i&(@), (&1, &2) being a random vector. It is natural to 
put E(é + 1.) = E&, + iE&. Complex-valued random variables € = &; + i&2 and 
n =n +in2 are independent if the o-algebras o (&), 2) and o(1, n2) generated 
by the vectors (&, &2) and (71, 72), respectively, are independent. It is not hard to 
verify that, for such random variables, 


Eén = EéEn. 


Definition 7.1.1 The characteristic function (ch.f.) of a real-valued random variable 
€ is the complex-valued function 


A.A. Borovkov, Probability Theory, Universitext, 153 
DOI 10.1007/978-1-4471-5201-9_7, © Springer-Verlag London 2013 


154 7 Characteristic Functions 


ge (t) := Ee!" = [ ear, 
where ¢ is real. 
If the distribution function F(x) has a density f(x) then the ch.f. is equal to 
ge (t) = / e'* f(x) dx 


and is just the Fourier transform of the function f(x).! In the general case, the ch.f. 
is the Fourier—Stieltjes transform of the function F(x). 

The ch.f. exists for any random variable €. This follows immediately from the 
relation 


lec] s fle™|arays f are =1. 


Ch.f.s are a powerful tool for studying properties of the sums of independent random 
variables. 


7.1.1 Properties of Characteristic Functions 


1. For any random variable &, 
ye(0)=1 and |ge(t)| <1 forallt. 


This property is obvious. 


2. For any random variable &, 


Pat +b(t) =e! ye (ta). 
Indeed, 


Pat-+b(t) = Feil G+) = el tbpgiaté = e! oe (ta). 


'More precisely, in classical mathematical analysis, the Fourier transform g(t) of a function f(t) 
from the space L, of integrable functions is defined by the equation 


g(t) = x / et fdt 


(the difference from ch.f. consists in the factor 1/277). Under this definition the inversion formula 
has a “symmetric” form: if g € L then 


f@= ~* o() dt. 


1 
— |e 
V2 | 
This representation is more symmetric than the inversion formula for ch.f. (7.2.1) in Sect. 7.2 
below. 


7.1 Definition and Properties of Characteristic Functions 155 


3. If &,...,&§ are independent random variables then the ch. of the sum Sy, = 
&) +---+&, is equal to 


Qs, (t) = Ge, (t) +++ Ge, (0). 


Proof This follows from the properties of the expectation of the product of inde- 
pendent random variables. Indeed, 


Qs, (t) = Eel! Git +80) & Belli pith... pithy 
= Keli Pelt... Reltin — Pe, (t) pe, (t) ++ Ge, (t). 


Thus to the convolution F¢, * Fz, there corresponds the product @¢, @¢,. 
4. The chf. pz (t) is a uniformly continuous function. 
Indeed, as h > 0, 
lott + h) _ y(t)| = |E(ef TE as el’) < Ee" —_ 1| =e 0) 
Dp 


by the dominated convergence theorem (see Corollary 6.1.2) since |e!” — 1] —> 0 
as h — 0, and |e" — 1| <2. 


5. If the k-th moment exists: E|é|* < 00, k > 1, then there exists a continuous k-th 
derivative of the function g(t), and gh (0) = ikBE*, 


Proof Indeed, since 


| / ixe™ dF(x)| < / |x| d F(x) = Elé| <00, 


the integral fi xe!'* d F(x) converges uniformly in t. Therefore one can differentiate 
under the integral sign: 


yg (t)=i / xe! dF (x), gy! (0) = iEé. 
Further, one can argue by induction. If, for] <k, 
e@=i [ x! aFeo, 
then 


eo" tDity = jit fares dF (x) 


by the uniform convergence of the integral on the right-hand side. Therefore 


gt (0) — fe pelt 


156 7 Characteristic Functions 


Property 5 implies that if E|é|* < 00 then, in a neighbourhood of the point t = 0, 
one has the expansion 


Bee apa 33 
(it)! j k 
e®)=1+) | — Be + o((|r*]). (7.1.1) 
j=l 
The converse assertion is only partially true: 
If a derivative of an even order g° exists then 
Els <00, — p°(0) = (-)*BE™. 


We will prove the property for k = 1 (for k > 1 one can employ induction). It 
suffices to verify that E|é|? is finite. One has 


2~(0) — y(2h) — y(—2h) bag ete ae a sin? hé 
4h2 — Oh ie? 


Since h~? sin? hé > £2 ash > 0, by Fatou’s lemma 


_ (290) — g(2h) — g(—2h) __ _sin* hé 
—g"(0) = | = lim E—~— 
g@) tim ( Ake (20 Ue 
2, 
h 
pig ae 
noo hh? 


6. If € = 0 then gz(A) is defined in the complex plane for Imi > 0. Moreover, 
lge(A)| < 1 for such 4, and in the domain Imi > 0, g¢ (A) is analytic and con- 
tinuous including on the boundary Imi = 0. 


Proof That p(A) is analytic follows from the fact that, for Im > 0, one can differ- 
entiate under the integral sign the right-hand side of 


ve) = fe are. 
0 


(For Im > 0 the integrand decreases exponentially fast as x — 00.) 


Continuity is proved in the same way as in property 4. This means that for non- 
negative € the ch.f. g¢ (A) uniquely determines the function 


W(s) = ge (is) = Ee 


of real variable s > 0, which is called the Laplace (or Laplace-Stieltjes) transform 
of the distribution of é. 

The converse assertion also follows from properties of analytic functions: the 
Laplace transform wy(s) on the half-line s = 0 uniquely determines the chf. g¢ (A). 


7. Pe (t) = ge (—t) = g_e(t), where the bar denotes the complex conjugate. 


7.1 Definition and Properties of Characteristic Functions 157 


Proof The relations follow from the equalities 


Get) = Ee'é = Eeité = EeW'8 


This implies the following property. 


7A. If is symmetric (has the same distribution as —&) then its chf. is real (gg (t) = 
ys (—1)). 

One can show that the converse is also true; to this end one has to make use of 
the uniqueness theorem to be discussed below. 

Now we will find the ch.f.s of the basic probability laws. 


Example 7.1.1 Vf € =a with probability 1, i.e. € E Iq, then g¢ (t) = ee 
Example 7.1.2 If & Bp then g(t) = pe!’ + (1— p)=1+ ple’ — 1). 


Example 7.1.3 If & € ®o,1 then g¢(t) = ete. 
Indeed, 


yon. 0) 
eltx—x /2 dx. 


ae 


Differentiating with respect to ¢ and integrating by parts (xe7*’/ 4 dx = —de 
we get 


1 1 
y(t) = oe = se fen Pas = —tg(t), 


2 
(Ing(t))’ =—t, Ing(t) = s +c. 


p(t) = g(t) = 


—x7/2y 


—1?/2 


Since g(0) = 1, one has c = 0 and g(t) =e 


Now let 7 be a normal random variable with parameters (a,o). Then it can be 
represented as n = o& +a, where & is normally distributed with parameters (0, 1). 
The ch.f. of 7 can be found using Property 2: 


y, (t) = eit a(t) /2 m gita—0? /2 

n(t) = a : 

Differentiating g(t) for 7 € ®o ,2, we will obtain that En‘ = 0 for odd k, and 
Ené = of (k — 1)(k —3)---1 fork =2,4,.... 


Example 7.1.4 If § € M1, then 


k 
ge (t) = Ee!® — ee ee e dX we 
! 


eee 


=e Hehe exp[j(e"! —1)]. 


158 7 Characteristic Functions 


Example 7.1.5 If & has the exponential distribution T',, with density we~°* for 
x > 0, then 


a 


oo . 
ge (t) = af elik—ax dx = 
0 


a—it 


Therefore, if € has the “double” exponential distribution with density seh ,-w< 
Xx < oo, then 


(t) = = ee + ae 
Wee! o\ ton tn) tee 
If € has the geometric distribution P(é = k) = (1 — p)p*, k=0,1,..., then 


1— 
g(t) = —a 


pet” 


Example 7.1.6 If & & Ko,1 (has the density [7 (1 + x7)]~') then g¢(t) = e7!"!. The 
reader will easily be able to prove this somewhat later, using the inversion formula 
and Example 7.1.5. 


Example 7.1.7 Tf & € Up,1, then 


ei —] 


it 


1 
ge (t) =, el dx = 
0 


By virtue of Property 3, the ch.f.s of the sums & + &, & + 2 + &,... that we 
considered in Example 3.6.1 will be equal to 


(el? — 1)? (et — 13 
Path) = 4”? Pe, +h+8 (1) = a 


We return to the general case. How can one verify whether one or another func- 
tion ¢ is characteristic or not? Sometimes one can do this using the above properties. 
We suggest the reader to determine whether the functions (1 + i= , 1+t, sint, cost 
are characteristic, and if so, to which distributions they correspond. 

In the general case the posed question is a difficult one. We state without proof 
one of the known results. 


Bochner-Khinchin’s Theorem A necessary and sufficient condition for a con- 
tinuous function p(t) with g(O) = | to be characteristic is that it is nonnegatively 
defined, i.e., for any real t),...,t, and complex d1,...,An, one has 


n 


> Q(tk —tj)AnrA; =O 
k,j=l 


(A is the complex conjugate of i). 


7.1 Definition and Properties of Characteristic Functions 159 


Note that the necessity of this condition is almost obvious, for if g(t) = Ee!’ 
then 


n 2 


n 
> ote — taka =E SD MOP SE 
k,j=1 k,j=1 


ST aeelt 


k=1 


> 0. 


7.1.2 The Properties of Ch.F.s Related to the Structure of the 
Distribution of & 


8. If the distribution of § has a density then gz (t) > 0 as |t| > ov. 

This is a direct consequence of the Lebesgue theorem on Fourier transforms. The 
converse assertion is false. 

In general, the smoother F(x) is the faster gy; (t) vanishes as |t| — oo. The for- 
mulas in Example 7.1.7 are typical in this respect. If the density f(x) has an inte- 
grable k-th derivative then, by integrating by parts, we get 


ve(t)= f el faydx = = [en seoas Ses ar fet Pearde, 
° it (it)* 
which implies that 
c 
ge (t) < ris 
8A. If the distribution of & has a density of bounded variation then 
t)|< 


This property is also validated by integration by parts: 


1 
z = flare. 


~ {el 


1 itx 
|ve(t)| = a * df (x) 


9. A random variable & has a lattice distribution with span h > 0 (see Defini- 
tion 3.2.3) if and only if 


mG i=® eG) 


if v is not a multiple of 21. 
Clearly, without loss of generality we can assume h = 1. Moreover, since 


<1 2) 


’ 


|oe—a(t)| = |e" ge (D)| = |ge (1) 


the properties (7.1.2) are invariant with respect to the shift by a. Thus we can as- 
sume the shift a is equal to zero and thus change the lattice distribution condition 
in Property 9 to the arithmeticity condition (see Definition 3.2.3). Since g(t) is a 
periodic function, Property 9 can be rewritten in the following equivalent form: 


160 7 Characteristic Functions 


The distribution of a random variable & is arithmetic if and only if 


ge (20) = 1, |ye(t)| <1 for allt € (0, 277). (7.1.3) 


Proof If & has an arithmetic distribution then 


ge(t) = SUPE =e =1 
k 


for t = 27r. Now let us prove the second relation in (7.1.3). Assume the contrary: 
for some v € (0, 277), we have |g¢(v)| = 1 or, which is the same, 


ibv 


ge(v) =e 
for some real b. The last relation implies that 
ge—p(v) = 1 =Ecosv(é — b) + iEsinv(é — b), E[1 —cosvu(é — b)| = 0. 


Hence, by Property E4 in Sect. 4.1, cosu(€ — b) = 1 and v(é — b) = 277k(@) with 
probability 1, where k(@) is an integer. Thus € — b is a multiple of 27/v > 1. 
This contradicts the assumption that the span of the lattice equals 1, and hence 
proves (7.1.3). 

Conversely, let (7.1.3) hold. As we saw, the first relation in (7.1.3) implies that 
€ takes only integer values. If we assume that the lattice span equals h > | then, 
by the first part of the proof and the first relation in (7.1.2), we get |p(27/h)| = 1, 
which contradicts the first relation in (7.1.3). Property 9 is proved. 


The next definition looks like a tautology. 


Definition 7.1.2 The distribution of & is called non-lattice if it is not a lattice distri- 
bution. 
10. If the distribution of & is non-lattice then 
lve(t)| <1 for allt 40. 
Proof Indeed, if we assume the contrary, i.e. that |p(u)| = 1 for some u ¥ 0, then, 


by Property 9, we conclude that the distribution of & is a lattice with span h = 27/u 
or with a lesser span. 


11. If the distribution of & has an absolutely continuous component of a positive 
mass p > 0, then it is clearly non-lattice and, moreover, 


lim sup| ge (1) | <1l-—-p. 
|tl oo 


This assertion follows from Property 8. 

Arithmetic distributions occupy an important place in the class of lattice distri- 
butions. 

For arithmetic distributions, the ch.f. g(t) is a function of the variable z = el! 
and is periodic in t with period 27. Hence, in this case it is sufficient to know the 


7.2. Inversion Formulas 161 


behaviour of the ch.f. on the interval [—z, z] or, which is the same, to know the 
behaviour of the function 


paHEe =) ¢Pe=) 


on the unit circle |z| = 1. 


Definition 7.1.3 The function p¢(z) is called the generating function of the random 
variable & (or of the distribution of &). 


Since peg (e*) = gz (t) is a ch.f., all the properties of ch.f.s remain valid for gener- 


ating functions, with the only changes corresponding to the change of variable. For 
more on applications of generating functions, see Sect. 7.7. 


7.2 Inversion Formulas 


Thus for any random variable there exists a corresponding ch.f. We will now show 
that the set £ of functions e’’* is a distribution determining class, i.e. that the dis- 
tribution can be uniquely reconstructed from its ch.f. This is proved using inversion 
formulas. 


7.2.1 The Inversion Formula for Densities 


Theorem 7.2.1 If the ch,f. y(t) of a random variable & is integrable then the distri- 
bution of — has the bounded density 


f= = feted (7.2.1) 
20 


This fact is known from classical Fourier analysis, but we shall give a proof of a 
probabilistic character. 


Proof First we will establish the following (Parseval’s) identity: for any fixed e > 0, 
1 ; 
pe(t) = — / ei a (yye“P i /? du 
20 


(u —t)* 
/ exp - |Riaw, (7.2.2) 


Qe 


21 € 


where F is the distribution of €. We begin with the equality 


1 Soa (fi) (7.23) 
se | ovfii-F| v=exn|- 2 | Ls 


162 7 Characteristic Functions 
both sides of which being the value of the ch.f. of the normal distribution with 


parameters (0, 1) at the point (§ —t)/e. After changing the variable x = eu, the 
left-hand side of this equality can be rewritten as 


Fe | esvfine —t)- =| du 
J2n 2 


If we take expectations of both sides of (7.2.3), we obtain 


er / | ea 
g(tuje 2 du= | expy— F(du). 
262 


é 
— le 
al 
This proves (7.2.2). 


To prove the theorem first consider the left-hand side of the equality (7.2.2). Since 


ype | e2u2 . . 
e & "/2_. 1 ase—>0, |e 2 | <1 and (wu) is integrable, as ¢ > 0 one has 


1 : 
Pel) > 5 / e-™ 9(u) du = po(t) (7.2.4) 


uniformly in t, because the integral on the left-hand side of (7.2.2) is uniformly 
continuous in t. This implies, in particular, that 


b b 
[ roars [ roe. (7.2.5) 


Now consider the right-hand side of (7.2.2). It represents the density of the sum 
& + en, where & and 7 are independent and n € ®o 1. Therefore 


b 
/ pe(t)dt =P(a<& +en <b). (7.2.6) 


Since € + €n ae & as ¢ — 0 and the limit rg De(t) dt exists for any fixed a and b by 
virtue of (7.2.5), this limit (see (7.2.6)) cannot be anything other than F({a, b)). 
Thus, from (7.2.5) and (7.2.6) we get 


b 
[roar =F((a.0)), 
a 
This means that the distribution F has the density po(t), which is defined by re- 
lation (7.2.4). The boundedness of po(t) evidently follows from the integrability 


of @: 


1 
po(t) < = | leolar < 00. 


The theorem is proved. 


7.2. Inversion Formulas 163 


7.2.2 The Inversion Formula for Distributions 


Theorem 7.2.2 If F(x) is the distribution function of a random variable & and 9(t) 
is its ch.f., then, for any points of continuity x and y of the function F (x),? 


1 etx _ ety 2.2 
F(y)- F@)=— tim, | =" gine o at. (7.2.7) 
27 o—0 it 


If the function p(t)/t is integrable at infinity then the passage to the limit under the 
integral sign is justified and one can write 


1 eT itx _ ew ity 
FQ) ~ FQ) = = / =f gina. (7.2.8) 


Proof Suppose first that the ch.f. g(t) is integrable. Then F(x) has a density f(x) 
and the assertion of the theorem in the form (7.2.8) follows if we integrate both sides 
of Eq. (7.2.1) over the interval with the end points x and y and change the order of 
integration (which is valid because of the absolute convergence).° 

Now let g(t) be the characteristic function of a random variable € with an ar- 
bitrary distribution F. On a common probability space with €, consider a random 
variable 7 which is independent of € and has the normal distribution with parame- 
ters (0, 207). As we have already pointed out, the ch.f. of is eo 

This means that the ch.f. of + 7, being equal to g(te te, is integrable. There- 
fore by (7.2.8) one will have 


[e-e) etx _ e tty 


g(tje?* dt. (7.2.9) 


1 
F — F; = 
pin) — Fen) = 5 f 
Since 7 —.0asa > 0, we have Fz ,,, => F (see Chap. 6). Therefore, if x and y are 
points of continuity of F, then F(y) — F(x) = limg-50(Fe4y(9) — Fe+n(%)). This, 
together with (7.2.9), proves the assertion of the theorem. 


In the proof of Theorem 7.2.2 we used a method which might be called the 
“smoothing” of distributions. It is often employed to overcome technical difficul- 
ties related to the inversion formula. 


Corollary 7.2.1 (Uniqueness Theorem) The ch.f. of a random variable uniquely 
determines its distribution function. 


71In the literature, the inversion formula is often given in the form 


A eT itx _ eaity 


1 
F(y)-F(x)= sn —7 Oat 


oo s_A 
which is equivalent to (7.2.7). 


3Formula (7.2.8) can also be obtained from (7.2.1) without integration by noting that 
(F(x) — F(y))/( — x) is the value at zero of the convolution of two densities: f(x) and the 
uniform density over the interval [—y, —x] (see also the remark at the end of Sect. 3.6). The ch-f. 


of the convolution is equal to << 9(t). 


164 7 Characteristic Functions 


The proof follows from the inversion formula and the fact that F is uniquely 
determined by the differences F(y) — F(x). 

For /attice random variables the inversion formula becomes simpler. Let, for the 
sake of simplicity, € be an integer-valued random variable. 


Theorem 7.2.3 If pz(Z) = Ez? is the generating function of an arithmetic random 
variable then 


P(E =k) = a / pez * dz. (7.2.10) 
201 |z|=1 


Proof Turning to the ch.f. g(t) = par e!'JP(é = j) and changing the variables z = 
it in (7.2.10) we see that the right-hand side of (7.2.10) equals 


54 . 1 ys 

—itk . it(j—k) 
— t)dt = — y P(é = ; dt. 
5) a pe (t) 5) (§ af. 


Here all the integrals on the right-hand side are equal to zero, except for the integral 
with 7 =k which is equal to 27. Thus the right-hand side itself equals P(é = k). 
The theorem is proved. 


Formula (7.2.10) is nothing else but the formula for Fourier coefficients and has 
a simple geometric interpretation. The functions {e, = e'’*} form an orthonormal 
basis in the Hilbert space L2(—z, 7) of square integrable complex-valued functions 
with the inner product 


1 ua 
(h.8)= all f(t)g(t) dt 
WT Jn 


(g is the complex conjugate of g). If g¢ = > ex P(E = k) then it immediately follows 
from the equality g: = Ye (<, ex) that 


1 u : 
PE = =(s,e1) = == / eth at. 


7.2.3 The Inversion Formula in L2. The Class of Functions that 
Are Both Densities and Ch.F-s 


First consider some properties of ch.f.s related to the inversion formula. As a prelim- 
inary, note that, in classical Fourier analysis, one also considers the Fourier trans- 
forms of functions f from the space L2 of square-integrable functions. Since in this 
case a function f is not necessarily integrable, the Fourier transform is defined as 


7.2 Inversion Formulas 165 


the integral in the principal value sense:* 


N 
g(t):= lim gy@®, — ga(t):= : el F(x)dx, (7.2.11) 
Noo —N 
where the limit is taken in the sense of convergence in L3: 


fio — gwny(t)| dx —>0 asN> oo. 


Since by Parseval’s equality 


1 1/2 
I fller = x llellae, where gti =| f lear] ; 


the Fourier transform maps the space L2 into itself (there is no such isometricity 
for Fourier transforms in L1). Here the inversion formula (7.2.1) holds true but the 
integral in (7.2.1) is understood in the principal value sense. 

Denote by F and 4 the class of all densities and the class of all ch.f.s, respec- 
tively, and by H{),4 C Ly, the class of nonnegative real-valued integrable ch.f.s, 
so that the elements of J{;, are in F up to the normalising factors. Further, let 
(H1,4)— be the inverse image of the class H{},, in F for the mapping f > ¢, 
i.e. the class of densities whose ch.f.s lie in F{1,4. It is clear that functions f 
from (H),4)—) and gy from 3{;,4 are necessarily symmetric (see Property 7A in 
Sect. 7.1) and that f(0) € (0, 00). The last relation follows from the fact that, by the 
inversion formula for gy € F{;,4, we have 


loll = llelle, = [ omar = 2rf (0). 


Further, denote by (F(;,)j. the class of normalised functions yg €Hy4, so 


g 
Tell’ 
that (JF, ,4)\. C F, and denote by F?-*) the class of convolutions of symmetric 
densities from Lo: 

FO*) 1 { f°") : f € Lo, f is symmetric}, 


where 


F=f FOF —Hadt 


Theorem 7.2.4 The following relations hold true: 
P= Gin FOP CHa 
The class (F{1,+)).) may be called the class of densities conjugate to f € 


(F;,4)—). It turns out that this class coincides with the inverse image (H{,,4)). 
The second statement of the theorem shows that this inverse image is a very rich 


4 i i al 
Here we again omit the factor Jin (cf. the footnote on page 154). 


166 7 Characteristic Functions 


class and provides sufficient conditions for the density f to have a conjugate. We 
will need these conditions in Sect. 8.7. 


Proof of Theorem 7.2.4 Let f € (H,4)). Then the corresponding ch.f. g is in 
31+ and the inversion formula (7.2.1) is applicable. Multiplying its right-hand side 
by Ta , we obtain an expression for the ch.f. (at the point —t) of the density vl 
(recall that yg > 0 is symmetric if g € H{1,4). This means that Ht is a ch.f. and, 
moreover, that f € (H{1,+)).1- 

Conversely, suppose that f* : =a € (1,4); Then f* € F is symmetric, and 
the inversion formula can be applied to g: 


f= = fei emar= ae feted, ni) = fe prooas. 
2n 2a al 


Since the ch.f. g*(t) := aoe belongs to H{;,,, one has f* € (Hy). 

We now prove the second assertion. Suppose that f € Lz. Then g € Ly and 
g* € L;. Moreover, by virtue of the symmetry of f and Property 7A in Sect. 7.1, 
the function ¢ is real-valued, so gy? > 0. This implies that g* « H 1,4. Since gy is 


the ch.f. of the density f®*, we have f@* € (H,,,))). The theorem is proved. 


Note that any bounded density f belongs to Lz. Indeed, since the Lebesgue mea- 
sure of {x : f(x) => 1} is always less than 1, for f(-) < N we have 


i712, =f Pooass | fsydx +n? f dx 1an’, 
fa)<l f@zl 


Thus we have obtained the following result. 


Corollary 7.2.2. For any bounded symmetric density f , the convolution f ©* is, up 
to a constant factor, the ch.f. of a random variable. 


Example 7.2.1 The “triangle” density 
(x) 1—|x| if |x| <1, 
x)= 

. 0 tls 1: 


being the convolution of the two uniform distributions on [—1/2, 1/2] (cf. Exam- 
ple 3.6.1) is also a ch.f. We suggest the reader to verify that the preimage of this 
ch.f. is the density 


FOR = Me 


x 


(the density conjugate to g). setae the density g is conjugate to f, and the 
functions 87 f(t) and g(t) will be ch.f.s for g and f, respectively. 
These assertions will be useful in Sect. 8.7. 


7.3 The Continuity (Convergence) Theorem 167 


7.3 The Continuity (Convergence) Theorem 


Let {@p a) eae be a sequence of ch.f.s and {Fr}rcy the sequence of the respective 
distribution functions. Recall that the symbol => denotes the weak convergence of 
distributions introduced in Chap. 6. 


Theorem 7.3.1 (The Continuity Theorem) A necessary and sufficient condition for 
the convergence F, = F as n — oo is that @n(t) > g(t) for any t, v(t) being the 
chf. corresponding to F. 


The theorem follows in an obvious way from Corollary 6.3.2 (here two of the 
three sufficient conditions from Corollary 6.3.2 are satisfied: conditions (2) and (3)). 
The proof of the theorem can be obtained in a simpler way as well. This way is 
presented in Sect. 7.4 of the previous editions of this book. 

In Sect. 7.1, for nonnegative random variables € we introduced the notion of 
the Laplace transform w(s) := Ee~*5. Let Wn(s) and y(s) be Laplace transforms 
corresponding to F, and F. The following analogue of Theorem 7.3.1 holds for 
Laplace transforms: 

In order that F, = F asn — ov itis necessary and sufficient that Wn(s) > W(s) 
for each s > 0. 

Just as in Theorem 7.3.1, this assertion follows from Corollary 6.3.2, since the 
class { f (x) = e7°*, s > 0} is (like {e'’*}) a distribution determining class (see Prop- 
erty 6 in Sect. 7.1) and, moreover, the sufficient conditions (2) and (3) of Corol- 
lary 6.3.2 are satisfied. 

Theorem 7.3.1 has a deficiency: one needs to know in advance that the func- 
tion g(t) to which the ch.f.s converge is a ch.f. itself. However, one could have no 
such prior information (see e.g. Sect. 8.8). In this connection there arises a natural 
question under what conditions the limiting function g(t) will be characteristic. 

The answer to this question is given by the following theorem. 


Theorem 7.3.2 Let 
n(t) = / e!™ dFy(x) 


be a sequence of ch.f.s and y(t) > g(t) asn > o for any t. 
Then the following three conditions are equivalent: 


(a) g(t) isa chf; 
(b) g(t) is continuous at t = 0; 
(c) the sequence {F} is tight. 


Thus if we establish that ¢g, (t) — g(t) and one of the above three conditions is 
met, then we can assert that there exists a distribution F’ such that ¢ is the ch.f. of 
F and F, => F. 


168 7 Characteristic Functions 


Proof The equivalence of conditions (a) and (c) follows from Theorem 6.3.2. That 
(a) implies (b) is known. It remains to establish that (c) follows from (b). First we 
will show that the following lemma is true. 


Lemma 7.3.1 [fy is the ch,f. of & then, for any u > 0, 


2 1 u 
P(e > *) < -| [1 — p(t) ] dt. 
u Uu J_y 


Proof The right-hand side of this inequality is equal to 


1 u Cc : 
-| / (l—e""*)dF(x)dt, 
U J—u J—oo 


where F is the distribution function of €. Changing the order of integration and 


noting that 
u ; —itx u sin 
/ (I-e™)ar= (145 ) a u(1- “), 
a, ix = ux 


we obtain that 


1s" ve sin ux 
-f [1-@]ar=2 f (1- )arw 
U J—u —0o Ux 
> 2f (: = ) aro 
|x|>2/u 
> 2f (1 = =) dF (x) >| dF (x). 
Ix|>2/u |ux| Ix|>2/u 


uUx 


The lemma is proved. 


Now suppose that condition (b) is met. By Lemma 7.3.1 


noo n>o U J_y 


1 u 1 u 
limsup [ dF,,(x) < limsup -{ [1 _ gn(t)| dt = -{ [1 _ y(t)| dt. 
|x|>2/u U J—u 


Since g(t) is continuous at 0 and g(0) = 1, the mean value on the right-hand side can 
clearly be made arbitrarily small by choosing sufficiently small u. This obviously 
means that condition (c) is satisfied. The theorem is proved. 


Using ch.f.s one can not only establish convergence of distribution functions but 
also estimate the rate of this convergence in the cases when one can estimate how 
fast g, — g vanishes. We will encounter respective examples in Sect. 7.5. 

We will mostly use the machinery of ch.f.s in Chaps. 8, 12 and 17. In the present 
chapter we will also touch upon some applications of ch.f.s, but they will only serve 
as illustrations. 


7.4 The Application of Characteristic Functions 169 


7.4 The Application of Characteristic Functions in the Proof 
of the Poisson Theorem 


Let &,,...,&, be independent integer-valued random variables, 
k 
Sn= Doe P&=D=pe, P= =1—-pe—4. 
1 


The theorem below is a generalisation of the theorems established in Sect. 5.4.° 


Theorem 7.4.1 One has 


n n n 
P(Sn =k) — Hyn((4})| SD) E+ 2D fae, where w= D7 pr. 
k=1 k=1 k=1 


Thus, if one is given a triangle array &17, €on,..., n,n = 1,2,..., of indepen- 
dent integer-valued random variables, 


n 
Sn= 0 &n, Pn =1) = pen, P(Een = 0) = 1 — pen — Gen, 
k=1 


n 
L= S- Pin, 
k=1 


then a sufficient condition for convergence of the difference P(S, = k) — I, ({k}) 
to zero is that 


n 


n 
Sain > 0, > Pin —> 0. 
k=1 


k=1 
Since 
n 
2 

< “max Prn, 
De Pin SH MAX Pn 

k=1 

the last condition is always met if 


max Pkn —> 0, [A < “o = const. 
k<n 


5This extension is not really substantial since close results could be established using Theo- 
rem 5.2.2 in which &; can only take the values 0 and 1. It suffices to observe that the probability of 
the event A =), {& 40, & # 1} is bounded by the sum 7 q, and therefore 


P(Sn =k) =61 Yak + (1-6 Yge)POSn = KIA), <1, 1= 1,2, 


where P(S,, = k|A) = P(S; =k) and S* are sums of independent random variables &° with 


Pk 


ba a sare 


P(éf = 0) = 1 — pf. 


170 7 Characteristic Functions 
To prove the theorem we will need two auxiliary assertions. 


Lemma 7.4.1 /f Re 6 <0 then 
je®—1]<ipl, Je? -1-B] < 1877/2, |e? -1- B - B?/2| < IB 1°/6. 


Proof The first two inequalities follow from the relations (we use here the change 
of variables t = Bv and the fact that |e*| < 1 for Res <0) 


p 
é-i|=|/ e' dt 
|e? -1- pl = fr (e' —1)dt| = 


The last inequality is proved in the same way. 


1 
=| f [ era <IBl, 


<6 of vdv=|p*|/2. 


bf 1) dv 


Lemma 7.4.2 If |ax| <1, |bg| < 1,k=1,...,n, then 
n n n 
] [ee — [ [e) s do lax — el. 
k=1 k=1 k=1 
Thus if g(t) and 0; (t) are ch,f.s then, for any t, 


[ [%& | < dole -—&O). 
k=1 


k=1 


Proof Put Ay = Te 1% and By = i= by. Then |Ay| < 1, |Bn| < 1, and 


[An — Bn| = |An—14n — Bn—1bn| 
= |(An-1 — Bn—1)an + (Gn — bn) Bn 1| < |An—1 — Bn—1| + lan — byl. 


Applying this inequality n times, we obtain the required relation. 


Proof of Theorem 7.4.1 One has 
ge (t) = Ee" = 1 + py(e"’ — 1) +4x (ve) — 1), 


where y;(t) is the ch.f. of some integer-valued random variable. By independence 
of the random variables &;, 


gs, (t) = | [ ge(). 


k=1 
Let further ¢ € II,,. Then 


n 
pe(t) = Bel =e) = TT (0), 
k=1 


7.5 Characteristic Functions of Multivariate Distributions 171 


where 6; (t) = e?* (c'—1)_ Therefore the difference between the ch.f.s Ps, and ge can 
be bounded by Lemma 7.4.2 as follows: 


n n 
I] Pk — I] OK 
k=1 k=1 


where by Lemma 7.4.1 (note that Re(e!! — 1) <0) 


lps, (t1) — ge (t)| = 


n 
< Dolo - Ml, 
k=1 


Distt 2 2: 
: - —1 
\e(t) — 1 — pe(e# —1)| < me eee: (sin? + (1 — cost)’) 
2 sim’t a gint ! (7.4.1) 
= sin’ — }, A, 
Pk 5 2 


” “ sin’ t t 
2 eon 
Yolen Al <2) a + Yo ve 5 +2 sin 5) 
k=1 k=1 k=1 
It remains to make use of the inversion formula (7.2.10): 


1 
20 
1 * z é sin? t¢ t 
= 2 + : +2sin* — ) | dt 
~ | > ae 2 ri( + 2sint 5) 


k=1 


n n 
yy 
k=1 k=1 


1 a 1 2 [* t 
= | sin’ tdt = -, =f int ae. 
20 0 4 T JO 2 4 


The theorem is proved. 


[P(Sn =) — M1. ({43)| 


IA 


i e-* (ps (1) — ge(t)) at 


IA 


for 


If one makes use of the inequality |e!’ — 1| < 2 in (7.4.1), the computations will 
be simplified, there will be no need to calculate the last two integrals, but the bounds 
will be somewhat worse: 


Ye — %1 <2(Soae + Y Pz), 
[P(Sn =4) — 11,.((09)| <2(S ae + 77). 


7.5 Characteristic Functions of Multivariate Distributions. 
The Multivariate Normal Distribution 


Definition 7.5.1 Given a random vector & = (&1, &,..., 4), its ch.f. (the ch.f. of 
its distribution) is defined as the function of the vector variable t = (t1,..., tg) equal 
to 


172 7 Characteristic Functions 


d 
ge (t) = Eeité’ — Fe! 8) = reo yaa] 
k=1 
d 


= feof Ss Sale £,(dx1,...,dxq), 


k=1 


where ee is the transpose of € (a column vector), and (f, €) is the inner product. 


The ch.f.s of multivariate distributions possess all the properties (with obvious 
amendments of their statements) listed in Sects. 7.1—7.3. 

It is clear that gz (0) = 1 and that |g¢(t)| < 1 and ge(—1) = ge (1) always hold. 
Further, g (t) is everywhere continuous. If there exists a mixed moment Eg! vee ef 
then gz has the respective derivative of order kj + --- + kg: 


ky beth (t) 


dy 
& ky te+k, ky ka 
“ak aka =i" “KE, ety 
at; ...0L7 t=0 


If all the moments of some order exist, then an expansion of the function @¢ (t) 
similar to (7.1.1) is valid in a neighbourhood of the point t = 0. 

If ge (t) is known, then the ch.f. of any subcollection of the random variables 
(Ekyy- ++ Ex; ) can obviously be obtained by setting all % except t%,,..., tk; to be 
equal to 0. 

The following theorems are simple extensions of their univariate analogues. 


Theorem 7.5.1 (The Inversion Formula) Jf A is a parallelepiped defined by the 
inequalities ay <x < by, k=1,...,d, and the probability P(é € A) is continuous 
on the faces of the parallelepiped, then 


etka et tkbk 5.5 
PEE A)= i im ooo | AU ih eer gs (t) dt; --- dt 


If the random vector € has a density f(x) and its ch.f. g(t) is integrable, then 
the inversion formula can be written in the form 


1 —i(t,x) 
= — ® t) dt. 
f (x) On fe ge (t) 
If a function g(x) is such that its Fourier transform 
By = fe gtadx 


is integrable (and this is always the case for sufficiently smooth g(x)) then the Par- 
seval equality holds: 


Eg(é) = 


E e (OOS) dt = a —1)2(t) dt. 
On g(t) Ome ge (—t) g(t) 


7.5 Characteristic Functions of Multivariate Distributions 173 


As before, the inversion formula implies the theorem on one-to-one correspon- 
dence between ch.f.s and distribution functions and together with it the fact that 
{ell x)) is a distribution determining class (cf. Definition 6.3.2). 

The weak convergence of distributions F,,(B) in the d-dimensional space to a 
distribution F(B) is defined in the same way as in the univariate case: F(,) = F if 


[ fedknan > f fooareas) 


for any continuous and bounded function f(x). 
Denote by ¢, (t) and g(t) the ch.f.s of distributions F,, and F, respectively. 


Theorem 7.5.2 (Continuity Theorem) A necessary and sufficient condition for the 
weak convergence F(,) => F is that, for any t, @,(t) > v(t) asn > o. 


In the case where one can establish convergence of ¢, (t) to some function ¢(f), 
there arises the question of whether g(t) will be the ch.f. of some distribution, or, 
which is the same, whether the sequence F(,) will converge weakly to some distri- 
bution F. Answers to these questions are given by the following assertion. Let Ay 
be the cube defined by the inequality max, |x;| < N. 


Theorem 7.5.3 (Continuity Theorem) Suppose a sequence ¢,(t) of ch.f.s converges 
as n —> © to a function g(t) for each t. Then the following three conditions are 
equivalent: 


(a) g(t) isachf; 
(b) g(t) is continuous at the point t = 0; 
(c) limsup,_, 5 Tage Fa (dx) > 0 as N > ov. 


All three theorems from this section can be proved in the same way as in the 
univariate case. 


Example 7.5.1 The multivariate normal distribution is defined as a distribution with 
density (see Sect. 3.3) 


1/2 
tare an 
me 


fe) = 


where 
d 
Q(x) = xAx? = 3 AjjXiXj, 
i,j=l 
and |A| is the determinant of a positive definite matrix A = ||q;;||. 
This is a centred normal distribution for which Eé = 0. The distribution of the 


vector € + a for any constant vector a is also called normal. 
Find the ch.f. of &. Show that 


(7.5.1) 


to2t? 
e) ’ 


ost) =ex0|- 


174 7 Characteristic Functions 


where o7 = A™! is the matrix inverse to A and coinciding with the covariance 
matrix |lo;;|| of &: 


oij = Hgi§;. 


Indeed, 


/\Al £5 os ge ¥ARP 
aO=son ff ox itx” — 5 dx,---dxq. (7.5.2) 


Choose an orthogonal matrix C such that CAC’ = D is a diagonal matrix, and 
denote by 11,..., 4n the values of its diagonal elements. Change the variables by 
putting x = yC andt = vC. Then 


d 
|A| =|Di =| ue. 
k=1 


d n 
1 1 
7 T . T T . 2 
itx® — ~xAx* =ivy’ —=yDy =i v, -= ; 
>) y 5” y 2 kk ) 2 UKE 


and, by Property 2 of ch.f.s of the univariate normal distributions, 


VIAT 1° e [ye 
i) = ex _ 
ge (t) Oni? II Pes 5 


d 2 

1 v 
dy =V\A\ |] exp| 3 | 
pe VE 2Uk 


vD~!y? tc? D~!cr? tAa!rT 
—— ¢ =exp = exp) — c 
2 2 2 


= exp{- 


On the other hand, since all the moments of & exist, in a neighbourhood of the point 
t = 0 one has 


1 -1,T 2 . T 1 227. 2 
ye(t)=1—5tAu't +0(> 012) =1+ inks + 5tovt +0(>-#). 
From this it follows that EE = 0, A~! = o?. 


Formula (7.5.1) that we have just proved implies the following property of nor- 
mal distributions: the components of the vector (&1, ...,&g) are independent if and 
only if the correlation coefficients p(&;,&;) are zero for alli # j. Indeed, if o7 isa 
diagonal matrix, then A = o~? is also diagonal and Se (x) 1s equal to the product of 
densities. Conversely, if (€1,...,& ) are independent, then A is a diagonal matrix, 
and hence o? is also diagonal. 


7.6 Other Applications of Characteristic Functions 175 


7.6 Other Applications of Characteristic Functions. 
The Properties of the Gamma Distribution 


7.6.1 Stability of the Distributions ®, ,2 and Ky, 


The stability property means, roughly speaking, that the distribution type is pre- 
served under summation of random variables (this description of stability is not 
exact, for more detail see Sect. 8.8). 

The sum of independent normally distributed random variables is also normally 
distributed. Indeed, let &; and & be independent and normally distributed with pa- 
rameters (a1, 07) and (az, 03), respectively. Then the ch.f. of | + & is equal to 


. fo; ; ler 
Ge, + (1) = Ge, Oe (1) = exp) ita, — er [ope a 


2 
t 
= expfirla +a2)—- acai +03) 


Thus the sum &; + &2 is again a normal random variable, with parameters (a; + 
a2, a7 + 05). 

Normality is also preserved when taking sums of dependent random variables 
(components of an arbitrary normally distributed random vector). This immediately 
follows from the form of the ch.f. of the multivariate normal law found in Sect. 7.5. 
One just has to note that to get the ch-f. of the sum €; + --- + &, it suffices to put 
t) =--- =, =1¢ in the expression 


PEt ,.0€n) L1s «++ fn) = Eexp{ité) +--+ + itnén}. 


The sum of independent random variables distributed according to the Poisson 
law also has a Poisson distribution. Indeed, consider two independent random vari- 
ables €| € TI), and 2 € II,,. The ch.f. of their sum is equal to 


Pe, +£ (t) = exp{Ai(e’ — 1)} exp{Ao(e"! — 1)} =exp{ (ai + A2)(e"” — 1)}. 


Therefore | + &) & T1,,+4,,. 
The sum of independent random variables distributed according to the Cauchy 
law also has a Cauchy distribution. Indeed, if &} € Ko, 5, and &2 € Kg, ,,,, then 


Pe 46 (t) = exp{ioyt _ ot} exp{iazt _ o\t\} 
= exp{i(ory + a2)t — (01 + 02)/¢|}; 
§1 + & © Kojta2,01 +09: 
The above assertions are closely related to the fact that the normal and Poisson 
laws are, as we saw, limiting laws for sums of independent random variables (the 


Cauchy distribution has the same property, see Sect. 8.8). Indeed, if Son //2n con- 
verges in distribution to a normal law (where S; = si &;, €; are independent 


and identically distributed) then it is clear that S,,/./n and (S2n — Sy)/./n will also 


176 7 Characteristic Functions 


converge to a normal law so that the sum of two asymptotically normal random 
variables also has to be asymptotically normal. 

Note, however, that due to its arithmetic structure the random variable € € TI), 
(as opposed to § € ®, ,2 or § © Ky.) cannot be transformed by any normalisation 
(linear transformation) into a random variable again having the Poisson distribution 
but with another parameter. For this reason the Poisson distribution cannot be stable 
in the sense of Definition 8.8.2. 

It is not hard to see that the other distributions we have met do not possess the 
above-mentioned property of preservation of the distribution type under summa- 
tion of random variables. If, for instance, &; and &2 are uniformly distributed over 
[O, 1] and independent then F¢, and F¢,+¢, are substantially different functions (see 
Example 3.6.1). 


7.6.2 The Y-distribution and its properties 


In this subsection we will consider one more rather wide-spread type of distribution 
closely related to the normal distribution and frequently used in applications. This 
is the so-called Pearson gamma distribution T,,,. We will write § @ Ty, if € has 
density 


a A-1 a 
posse.) ={ Fast e*, x>0, 


0, x <0, 


depending on two parameters a > 0 and A > 0, where J"(A) is the gamma function 
[o,@) 
ra)= i x” le" dx, A>. 
0 


It follows from this equality that f° f(x; a, 4) dx = 1 (one needs to make the variable 
change ax = y). If one differentiates the ch.f. 


Be LP Srey 
g(t) = ott; an= oe | ir lgit—ax gy 
P(A) Jo 


with respect to ¢ and then integrates by parts, the result will be 


y(t) = a* [ ixhelt¥—ex gy — a* aad [Petene 
P(A) Jo T'(A)a—it Jo 


ix 
= =(t); 
a—it 


(Ing(t))’ = (—Aln(@ — ity)’, y(t) =c(a —it)~*. 


Since g(0) = 1 one has c=a* and g(t) = (1 — it/a)~*. 

It follows from the form of the ch.f. that the subfamily of distributions Tg, for 
a fixed @ also has a certain stability property: if &| @To,, and & €Ty,, are 
independent, then €; + 2 @Taa,4a)- 


7.6 Other Applications of Characteristic Functions 177 


An example of a particular gamma distribution is given, for instance, by the dis- 
tribution of the random variable 


i= : &} 3 


where &; are independent and normally distributed with parameters (0, 1). This is the 
so-called chi-squared distribution with n degrees of freedom playing an important 
role in statistics. 

To find the distribution of x it suffices to note that, by virtue of the equality 
gt dy 


P(x? <x) =P(|&il < Vz) = i 


the density of x7 is equal to 
Fee PIP = FO /21/2), x Papa. 
1 


This means that the ch.f. of x is 
g" (t; 1/2, 1/2) = (1 — 2it)"/? = g(t; 1/2, n/2) 
and corresponds to the density f(t; 1/2,n/2). 
Another special case of the gamma distribution is the exponential distribution 
Ty =F q,1 with density 


f(x;a,l)=ae%, x>0, 


AI 
g(x;a,1)= (1 _ =) ; 
a 


We leave it to the reader to verify with the help of ch-f.s that if €; @T, and are 
independent, a; 4 q for j €/, then 


r(Dsies)- ye “TT ( a 


In various applications (in particular, in queueing theory, cf. Sect. 12.4), the so- 
called Erlang distribution is also of importance. This is a distribution with density 
f(x; a@, A) for integer A. The Erlang distribution is clearly a A-fold convolution of 
the exponential distribution with itself. 

We find the expectation and variance of a random variable € that has the gamma 
distribution with parameters (a, 4): 


and characteristic function 


AA+1) 


a7. A 22 st Nin, _ 
Eé = —ig (0;a,A)=—, Eé = 1p (0; a, A) = 2 > 
(04 a 


Mite ae | 
Var(&) = a ( y= 5 


a a 


178 7 Characteristic Functions 


Distributions from the gamma family, and especially the exponential ones, are 
often (and justifiably) used to approximate distributions in various applied problems. 
We will present three relevant examples. 


Example 7.6.1 Consider a complex device. The failure of at least one of n parts 
comprising the device means the breakdown of the whole device. The lifetime dis- 
tribution of any of the parts is usually well described by the exponential law. (The 
reasons for this could be understood with the help of the Poisson theorem on rare 
events. See also Example 2.4.1 and Chap. 19.) 

Thus if the lifetimes §; of the parts are independent, and for the part number j 
one has 


PE;sa)=0°"",, 2=0, 


then the lifetime of the whole device will be equal to n, = min(é1,...,&,) and we 
will get 


P(nn > Xx) =°( 8 >») = [[®G > x) =ss9|-« Joa 
j=l j=l i=l 


This means that 7, will also have the exponential distribution, and since 
Eg; => 1/aj . 


the mean failure-free operation time of the device will be equal to 


n -1 
1 
Em = ( y x) : 


i=1 


Example 7.6.2 Now turn to the distribution of ¢, = max(&1,...,&,), where &; are 
independent and all have the ['-distribution with parameters (a, 4). We could con- 
sider, for instance, a queueing system with n channels. (That could be, say, a mul- 
tiprocessor computer solving a problem using the complete enumeration algorithm, 
each of the processors of the machine checking a separate variant.) Channel number 
i is busy for a random time &;. After what time will the whole system be free? This 
random time will clearly have the same distribution as ¢,. 
Since the &; are independent, we have 


Pon <x) = r( ( \igi < ») = (PGi <2]. (7.6.1) 
j=1 


If n is large, then for approximate calculations we could find the limiting distri- 
bution of ¢, as n — oo. Note that, for any fixed x, P(¢, <x) ~ Oasn—> ow. 
Assuming for simplicity that wa = 1 (the general case can be reduced to this one 
by changing the scale), we apply L’ Hospital’s rule to see that, as x > ov, 
gol 


P(é; <x) =-[~ oF Pele i gy cs eo. 
. » FO Tr(a) 


7.6 Other Applications of Characteristic Functions 179 


Letting n — oo and 
x =x(n)=In[ndnn)!/F(a)]+u, u=const, 
we get 


(Inn)! ra) _, em 


FQ) ndnny! 


Therefore for such x and n — 00 we obtain by (7.6.1) that 


P(E; >x)~ 


e # 


Ps <x) =(1- (1+ 0(0)) ae 


n 


Thus we have established the existence of the limit 


lim P(: —In nnn)" < “) =e°" 
noo ” T(A) : 


or, which is the same, that 
] A-1 
aii n(Inn) 
P(A) 


In other words, for large n the variable ¢, admits the representation 


iss fo 


eh, Fu=e°" 


0 0 
FQ ]+:° where €° & Fo. 


Example 7.6.3 Let & and &) be independent with €; €F,,, and 2 €P,,,. What 
is the distribution of &)/(€1 + &)? We will make use of Theorem 4.9.2. Since the 
joint density f(x, y) of &; and n = & + & is equal to 

f(x,y) = f(x o, Ar) f(y — x3 @, A2), 


the density of 7) is 


q(y) = f(ys a, a1 +2), 

and the conditional density f(x | y) of &; given 7 = y is equal to 
_ f@y)  P@r+A2) xy — x)! 
a) FONE G2) yhtarb 
By the formulas from Sect. 3.2 the conditional density of | /y = &/(&1 + 2) (given 
the same condition €; + & = y) is equal to 
P(A, +A2) ig 
P(A) (A2) 

This distribution does not depend on y (nor on aw). Hence the conditional density 
of &;/(&; +&2) will have the same property, too. We obtain the so-called beta distri- 


bution B,,,,, with parameters 4; and A2 defined on the interval [0, 1]. In particular, 
for 4; = Az = 1, the distribution is uniform: B; ; = Uo,1. 


f(xly) x €[0, y]. 


yf yx |y)= xy2-! x E[0, 1]. 


180 7 Characteristic Functions 


7.7 Generating Functions. Application to Branching Processes. 
A Problem on Extinction 


7.7.1 Generating Functions 


We already know that if a random variable é is integer-valued, i.e. 
r(Uee = ) =1, 
k 


then the ch.f. g(t) will actually be a function of z = elf , and, along with its ch.f., 
the distribution of € can be specified by its generating function 


pe(z):= Eze =) ‘PE =h). 
k 
The inversion formula can be written here as 
1 © ae 1 —k-1 
PE =k)= ae e ge (t) dt = —— Z pe(z) dz. (7.7.1) 
Tw Jz 201i |z|=1 


As was already noted (see Sect. 7.2), relation (7.7.1) is simply the formula for 
Fourier coefficients (since e’* = costk +i sintk). 

If € and 7 are independent random variables, then the distribution of € + will 
be given by the convolution of the sequences P(é = k) and P(n =k): 

CO 
PE+n=n)= )) PE=HP(Q=n-k) 
k=—0o0 

(the total probability formula). To this convolution there corresponds the product of 
the generating functions: 


Pé+(Z) = Pg (Z) Pn). 
It is clear from the examples considered in Sect. 7.1 that the generating functions of 
random variables distributed according to the Bernoulli and Poisson laws are 
pe@)=1+pG-1), — pe) =exp{u@—- Dj, 


respectively. 

One can see from the definition of the generating function that, for a nonnegative 
random variable € > 0, the function p¢z(z) is defined for |z| < 1 and is analytic in 
the domain |z| < 1. 


7.7.2 The Simplest Branching Processes 


Now we turn to sequences of random variables which describe the so-called branch- 
ing processes. We have already encountered a simple example of such a process 


7.7 Generating Functions. Application to Branching Processes 181 


when describing a chain reaction scheme in Example 4.4.4. Consider a more general 
scheme of a branching process. Imagine particles that can produce other particles 
of the same type; these could be neutrons in chain reactions, bacteria reproducing 
according to certain laws etc. Assume that initially there is a single particle (the 
“null generation’) that, as a result of a “division” act, transforms with probabilities 
Sk, k =0,1,2,..., into k particles of the same type, 


ae =1. 
k=0 


The new particles form the “first generation”. Each of the particles from that gen- 
eration behaves itself in the same way as the initial particle, independently of what 
happened before and of the other particles from that generation. Thus we obtain the 
“second generation”, and so on. Denote by ¢, the number of particles in the n-th 
generation. To describe the sequence ¢,, introduce, as we did in Example 4.4.4, 
independent sequences of independent identically distributed random variables 


(EP EP 


where em” have the distribution 


P(E)” =k) = fe, k=0,1,.... 


Then the sequence ¢, can be represented as 


co =1, 
1 
a= g( ., 
2 2 
= EO 4 HE, 
fn = g™ re ae 
These are sums of random numbers of random variables. Since € - : ee ... do not 
depend on ¢,—1, for the generating function f(,)(z) = Ez" we obtain by the total 
probability formula that 


[o.e) 

(n) (n) 

fy @ = > Pnr-1 = EZ tt 
k=0 


=> PGn-1 =F" = fn-(F@), (7.7.2) 
k=0 


where 


F(z) = fay(z) = Ez!” =~ pez’. 
k=0 


182 7 Characteristic Functions 


Fig. 7.1 Finding the 
extinction probability of a 
branching process: it is given 
by the smaller of the two 
solutions to the equation 


z= f(@) 


Denote by f;,(z) the n-th iterate of the function f(z), ie. fi(z) = f(z), fo(z) = 
FU), fa) = f (f2(z)) and so on. Then we conclude from (7.7.2) by induction 
that the generating function of ¢, is equal to the n-th iterate of f(z): 


Ez" = finy(z). 


From this one can easily obtain, by differentiating at the point z = 1, recursive rela- 
tions for the moments of ¢,. 

How can one find the extinction probability of the process? By extinction we will 
understand the event that all ¢, starting from some n will be equal to 0. (If ¢, = 0 
then clearly Sn41 = Gn42 = +++ = 0, because P(En+1 = 0|t) = 0) = 1.) Set Ag = 
{¢; = O}. Then extinction is the event I Ax. Since Ay, C An+1, the extinction 
probability g is equal to g = limy—+oo P(A). 


Theorem 7.7.1 The extinction probability q is equal to the smallest nonnegative 
solution of the equation q = f (q). 


Proof One has P(A,;,) = fn(0) < 1, and this sequence is non-increasing. Passing in 
the equality 


fn+100) = f (fn(0)) (7.7.3) 


to the limit as n — oo, we obtain 


q=f(@), q<l. 


This is an equation for the extinction probability. Let us analyse its solutions. The 
function f(z) is convex (as f(z) > 0) and non-decreasing in the domain z > 0 
and f’(1) =m is the mean number of offspring of a single particle. First assume 
that P(é() = 1) < 1. If m <1 then f(z) >z for z < 1 and hence g = 1. If m>1 
then by convexity of f the equation g = f(q) has exactly two solutions on the 
interval [0, 1]: gi < 1 and q2 = 1 (see Fig. 7.1). Assume that g = q2 = 1. Then the 
sequence 6, = 1 — f, (0) will monotonically converge to 0, and f(1 — 46,) < 1— 4d, 
for sufficiently large n. Therefore, for such n, 


bn41 =1— fC — bn) > bn, 


7.7 Generating Functions. Application to Branching Processes 183 


which is a contradiction as 6,, is a decreasing sequence. This means that g = q; < |. 
Finally, in the case PEt = 1)= f; =1 one clearly has f(z) =z and gq = 0. The 
theorem is proved. 


Now consider in more detail the case m = 1, which is called critical. We know 
that in this case the extinction probability q equals 1. Let gy = P(An) = fn(O) be 
the probability of extinction by time n. How fast does g, converge to 1? By (7.7.3) 
one has gn+1 = f (gn). Therefore the probability p, = 1 — gy, of non-extinction of 
the process by time 7 satisfies the relation 


Pn+1 = &8(Pn), g(x) =1-—fd—x). 


It is also clear that yp, = Pp — Pn+1 is the probability that extinction will occur 
on step n. 


Theorem 7.7.2 If m= f'(1) = 1 and 0 <b:= f"(1) < © then yn, ~ oe and 


Pn x asn—> ©. 
Proof If the second moment of the number of offspring of a single particle is finite 
(b < co) then the derivative g”(0) = —b exists and therefore, since g(0) = 0 and 
g’(0) = f’(1) = 1, one has 
b 4 2 
=x-——x"+ ; > CO, 
g(x) =x a o(x ) x 


Putting x = p, — 0, we find for the sequence ay = 1/p,, that 


Pn — Pati bp +0(1)) b 
An+1 — an = = 5 >, 
PnPn4i 2p, —bpn/2+o(pn)) 2 
n—1 
bn 2 
m= ait } lar —0)~ > ar 


The theorem is proved. 


Now consider the problem on the distribution of the number ¢, of particles given 
fn > 0. 


Theorem 7.7.3 Under the assumptions of Theorem 7.7.2, the conditional distribu- 
tion of Pn&n (or 2¢,/(bn)) given t, > 0 converges as n — © to the exponential 
distribution: 


x 


P(Pnén >X\on >0) > e*, x>O0. 


The above statement means, in particular, that given ¢, > 0, the number of parti- 
cles ¢, is of order n as n > oo. 


184 7 Characteristic Functions 


Proof Consider the Laplace transform (see Property 6 in Sect. 7.1) of the condi- 
tional distribution of ppt, (given ¢, > 0): 


[o,@) 
E(e~SPn5"\¢, > 0) = is > e 'Pn Pc, =k). (7.7.4) 

” k=1 
We will make use of the fact that, if we could find an N such that e~*?" = 1 — pn, 
which is the probability of extinction by time N, then the right-hand side of (7.7.4) 
will give, by the total probability formula, the conditional probability of the extinc- 
tion of the process by time n+ N given its non-extinction at time n. We can evaluate 

this probability using Theorem 7.7.2. 
Since p, — 0, for any fixed s > 0 one has 


= 2s 
e *Pnh —| ~—spy~ ia 
Clearly, one can always choose N ~ n/s, 8) ~ 5, Sn | 5 such that e~*"?2 —1 = —py. 


Therefore e~*"Pn* — (1 — py)* and the right-hand side of (7.7.4) can be rewritten 
for s =s, as 


ie 1 
— SOP = KU — pw) = 5 Phen > 9 Sn =) 


a | 
_ Pn— Pn+N 
Pn 
Pn+N n N 1 
=1 ~1 = > : 
Pn n+N n+wN l+s 


Now note that 
E(e7*Pni Gs > 0) = E(e7SnPn in on > 0) =— E|e Pas (1 = eo Sn—5)Pnbn ‘eo > 0)]. 
Since e~% < 1 and 1 —e-% <a fora > 0, and Eé, = 1, E(é_|Sn > 0) = 1/pn, it is 


easily seen that the positive (since s, > s) difference of the expectations in the last 
formula does not exceed 


(Sn — 8) PaE(En|fn > 0) = Sp — 5 > 0. 


Therefore the Laplace transform (7.7.4) converges, as n > oo, to I1/(1 + 5). 
Since 1/(1 +s) is the Laplace transform of the exponential distribution: 


oe 1 
eo * dx = : 
[ l+s 


we conclude by the continuity theorem (see the remark after Theorem 7.3.1 in 
Sect. 7.3) that the conditional distribution of interest converges to the exponential 
law.° 

In Sect. 15.4 (Example 15.4.1) we will obtain, as consequences of martingale 
convergence theorems, assertions about the behaviour of ¢, as n — oo for branching 
processes in the case jz > | (the so-called supercritical processes). 


The simple proof of Theorem 7.7.3 that we presented here is due to K.A. Borovkov. 


Chapter 8 
Sequences of Independent Random Variables. 
Limit Theorems 


Abstract The chapter opens with proofs of Khintchin’s (weak) Law of Large Num- 
bers (Sect. 8.1) and the Central Limit Theorem (Sect. 8.2) the case of independent 
identically distributed summands, both using the apparatus of characteristic func- 
tions. Section 8.3 establishes general conditions for the Weak Law of Large Num- 
bers for general sequences of independent random variables and also conditions for 
the respective convergence in mean. Section 8.4 presents the Central Limit Theo- 
rem in the triangular array scheme (the Lindeberg—Feller theorem) and its corollar- 
ies, illustrated by several insightful examples. After that, in Sect. 8.5 an alternative 
method of compositions is introduced and used to prove the Central Limit Theo- 
rem in the same situation, establishing an upper bound for the convergence rate for 
the uniform distance between the distribution functions in the case of finite third 
moments. This is followed by an extension of the above results to the multivariate 
case in Sect. 8.6. Section 8.7 presents important material not to be found in other 
textbooks: the so-called integro-local limit theorems on convergence to the normal 
distribution (the Stone—Shepp and Gnedenko theorems), including versions for sums 
of random variables depending on a parameter. These results will be of crucial im- 
portance in Chap. 9, when proving theorems on exact asymptotic behaviour of large 
deviation probabilities. The chapter concludes with Sect. 8.8 establishing integral, 
integro-local and local theorems on convergence of the distributions of scaled sums 
on independent identically distributed random variables to non-normal stable laws. 


8.1 The Law of Large Numbers 


Theorem 8.1.1 (Khintchin’s Law of Large Numbers) Let {&}°° , be a sequence 
of independent identically distributed random variables having a finite expectation 
EKé,, =a and let Sy := & +--+ +&). Then 


Sn P 
—_— 7a ASN >Ow. 
n 


The above assertion together with Theorems 6.1.6 and 6.1.7 imply the following. 


A.A. Borovkov, Probability Theory, Universitext, 185 
DOT 10.1007/978-1-4471-5201-9_8, © Springer-Verlag London 2013 


186 8 Sequences of Independent Random Variables. Limit Theorems 


Corollary 8.1.1 Under the conditions of Theorem 8.1.1, as well as convergence of 
S;/n in probability, convergence in mean also takes place: 


S 
E)— —a 
n 


>0 asn>-ow. 


Note that the condition of independence of &; and the very assertion of the the- 
orem assume that all the random variables & are given on a common probability 
space. 

From the physical point of view, the stated law of large numbers is the sim- 
plest ergodic theorem which means, roughly speaking, that for random variables 
their “time averages” and “space averages” coincide. This applies to an even greater 
extent to the strong law of large numbers, by virtue of which S,,/n — a with prob- 
ability 1. 

Under more strict assumptions (existence of variance) Theorem 8.1.1 was ob- 
tained in Sect. 4.7 as a consequence of Chebyshev’s inequality. 


Proof of Theorem 8.1.1 We have to prove that, for any ¢ > 0, 


P( a >e) +0 
n 


as n — oo. The above relation is equivalent to the weak convergence of distributions 
Sn /n & Iq. Therefore, by the continuity theorem and Example 7.1.1 it suffices to 
show that, for any fixed ¢, 


PS, /n(t) > a 


The ch.f. g(t) of the random variable &% has, in a certain neighbourhood of 0, the 
property |g(t) — 1| < 1/2. Therefore for such t one can define the function /(t) = 
In g(t) (we take the principal value of the logarithm). Since &, has finite expectation, 
the derivative 

yO) _ 


(Os =" 


exists. For each fixed ¢ and sufficiently large n, the value of /(t/n) is defined and 


PS,/n(t) =p" (t/n) = hm, 


Since /(0) = 0, one has 


ol(t/nyn _ exp] —1(0) _5 elt = piat 
t/n 


as n — oo. The theorem is proved. 


8.2 The Central Limit Theorem for Identically Distributed Random Variables 187 


8.2 The Central Limit Theorem for Identically Distributed 
Random Variables 


Let, as before, {&,} be a sequence of independent identically distributed random 
variables. But now we assume, along with the expectation E&, =a, the existence 
of the variance Var &, = o”. We retain the notation S, = &; +---+&, for sums of 
our random variables and @(x) for the normal distribution function with parameters 
(0, 1). Introduce the sequence of random variables 


Sy —an 


aa i 


Theorem 8.2.1 If 0 < 07 < 00, then P(¢, < x) > ®(x) uniformly in x (—oo < 
x<oo)asn>o. 


In such a case, the sequence {¢,} is said to be asymptotically normal. 

It follows from ¢, > ¢€ € ®o 1, ce > 0, EC? = E¢? = 1 and from Lemma 6.2.3 
that the sequence {G7} is uniformly integrable. Therefore, as well as the weak 
convergence ¢, > €, € €@ ®o) (Ef(¢,) — Ef(¢) for any bounded continuous 
Ff), one also has convergence Ef (¢,) — Ef (¢) for any continuous f such that 
|[f(x~)|<cUd+ x”) (see Theorem 6.2.3). 


Proof of Theorem 8.2.1 The uniform convergence is a consequence of the weak 
convergence and continuity of ®(x). Further, we may assume without loss of gen- 
erality that a = 0, for otherwise we could consider the sequence {&’,, = &) — ayo 1 
without changing the sequence {¢,,}. Therefore, to prove the required convergence, 


—1?/2 


it suffices to show that @;, (t) > e when a = 0. We have 


Pe, (t) = ¢" (<5). where y(t) = ge, (t). 


Since Eé? exists, gy” (t) also exists and, as t — 0, one has 


2 
p(t) = p(0) + tg"(0) + 5°") +o(t?) =1- — + o(t?). (8.2.1) 


Therefore, as n > ov, 


Ing. () =nl i on (“) 
msonfi- (ca) 


The theorem is proved. 


188 8 Sequences of Independent Random Variables. Limit Theorems 


8.3 The Law of Large Numbers for Arbitrary Independent 
Random Variables 


Now we proceed to elucidating conditions under which the law of large numbers and 
the central limit theorem will hold in the case when &; are independent but not nec- 
essarily identically distributed. The problem will not become more complicated if, 
from the very beginning, we consider a more general situation where one is given an 
arbitrary series §) n,...,&n.n,1 = 1,2,... of independent random variables, where 
the distributions of &%, may depend on n. This is the so-called triangular array 
scheme. 
Put 


n 
on = = Ek.n- 
k=] 
From the viewpoint of the results to follow, we can assume without loss of generality 
that 
BK& 1 = 0. (8.3.1) 


Assume that the following condition is met: as n > 00, 


n 
Dy := Y\Emin (|&&.nl, [&,nl7) > 0. [D1] 
k=1 
Theorem 8.3.1 (The Law of Large Numbers) /f conditions (8.3.1) and [D,] are 


satisfied, then C, & Io or, which is the same, Cy 4 0asn—> oo. 


Example 8.3.1 Assume & = &» do not depend on n, E& = 0 and E|é&|° < ms < 
oo for 1 < s <2. For such s, there exists a sequence b(n) = o(n) such that n = 
o(b* (n)). Since, for &% » = &/b(n), 


Emin(|& nl, Een) = | ss l&x| < bin] + La lEx| > vin)| 
Ex Ex 
<#{ | bw *s lel < bin | + Bl | ee eS lee 70) 
= msb *(n), 


we have 
Di <nmsb*(n) > 0, 
and hence S,/b(n) ~ 0. 
A more general sufficient condition (compared to ms < oo) for the law of large 


numbers is contained in Theorem 8.3.3 below. Theorem 8.1.1 is an evident corollary 
of that theorem. 


8.3. The Law of Large Numbers for Arbitrary Independent Random Variables 189 


Now consider condition [D;] in more detail. It can clearly also be written in the 
form 


Dy = YTB (|Einls kn] > 1) + DE (lSk.nl?s 6k,nl $1) > 0. 


k=1 k=1 
Next introduce the condition 


n 


M:= )E|&n| <c <0 (8.3.2) 
k=1 
and the condition 
n 
Mi(t):= YE (fk.nl; l&e.nl > t) > 0 [M1] 
k=1 


for any tT > 0 asn — oo. Condition [Mj] could be called a Lindeberg type condition 
(the Lindeberg condition [M2] will be introduced in Sect. 8.4). 

The following lemma explains the relationship between the introduced condi- 
tions. 


Lemma 8.3.1 1. ([Mi] 9 (3.2)} Cc [Dy]. 2. [Di] c [Mi]. 


That is, conditions [Mj] and (8.3.2) imply [Dj], and condition [D;] implies 
[Mj]. 

It follows from Lemma 8.3.1 that under condition (8.3.2), conditions [D;] and 
[M_,] are equivalent. 


Proof of Lemma 8.3.1 1. Let conditions (8.3.2) and [Mj] be met. Then, for 
tT<1, — gi(x)=min(|xI, |x’), 


one has 


n 


n n 
Di = >> Egi (Een) Sd E(léenls Senl > 7) + E(lék.nls lean! ST) 
k=1 k=1 k=1 
n 


< My(t) +7 > E(léknls Eenl St) S$ Mi(t) + 7M (0). (8.3.3) 
k=1 


Since M, (0) = M, <c and t can be arbitrary small, we have D; > 0 asn > ov. 
2. Conversely, let condition [D;] be met. Then, for t < 1, 


Mi(t) < 9 E(lEenls l&k.nl > 1) 


k=1 


190 8 Sequences of Independent Random Variables. Limit Theorems 


n 


+07 SCE (lEnl?st <l&enl<l)<t7'Di>0 (8.3.4) 
k=1 


as n — oo for any t > 0. The lemma is proved. 


Let us show that condition [Mj] (as well as [D;]) is essential for the law of large 
numbers to hold. 
Consider the random variables 


ze 


: with probability +, 
“" |= with probability 1— £. 
For them, E& » = 0, Elén| = au) ~ 2, M; <2, condition (8.3.2) is met, but 


M(t) = nt > 5 forn > 2,t < 1/2, and thus condition [Mj] is not satisfied. Here 
the number v, of positive & », 1 <k <n, converges in distribution to a random 


variable v having the Poisson distribution with parameter 4 = 1. The sum of the 


remaining & ,$ is equal to — vn) =p Therefore, ¢, + 1 & TI, and the law 
of large numbers does not hold. 
Each of the conditions [D;] and [Mj] imply the uniform smallness of E|&;_y|: 


max Elé&,| 20 asn>oow. (8.3.5) 
1<k<n 


Indeed, equation [M;] means that there exists a sufficiently slowly decreasing se- 
quence t, — 0 such that Mj (t,,) — 0. Therefore 


max E|&4,n| < max| Ty = E((&& nl; léxnl > tn) | <tm%+M\(tm) > 0. (8.3.6) 


In particular, (8.3.5) implies the negligibility of the summands &;_y. 
We will say that &;, are negligible, or, equivalently, have property [S], if, for any 
e>0, 


max P(|& ,n| >e)>0 asn—> oo. [S] 
k<n 


Property [S] could also be called uniform convergence of &,., in probability to 
zero. Property [S] follows immediately from (8.3.5) and Chebyshev’s inequality. It 
also follows from stronger relations implied by [Mj]: 


P(max [6 > e) — P( LU {lé.n! > ‘}) 


k<n 


< )OP(lGnl>2) se! DOE (léenls Eenl > 2) > 0. [Si] 


k<n k<n 


We now turn to proving the law of large numbers. We will give two versions of 
the proof. The first one illustrates the classical method of characteristic functions. 


8.3. The Law of Large Numbers for Arbitrary Independent Random Variables 191 


The second version is based on elementary inequalities and leads to a stronger as- 
sertion about convergence in mean.! 
Here is the first version. 


Proof of Theorem 8.3.17 Put 
Pk.n(t) = Bel", Ag (t) = Gkn() — 1. 


One has to prove that, for each fr, 


n 
QE, (t) S Ee! = I] Pk,n (t) = 1 
k=1 


as n > oo. By Lemma 7.4.2 


[ [on -[] 1s o14co 
k=1 k=1 k=1 


|, (t) — 1 


n n 
=) [Beis — 1] =) [E(e!*" — 1 — itky,n)|- 
k=1 k=1 
By Lemma 7.4.1 we have (for g1 (x) = min(|x|, x7)) 
|e* — 1 —itx| < min(2|rx|, °x°/2) < 2¢1 (tx) < 2h()g1(0), 


where h(t) = max(|t|, |t|”). Therefore 


lve, #) = 1] < 2h(t) \2 Bei (Een) = 2H()D1 > 0. 
k=1 


The theorem is proved. 


The last inequality shows that |g;,(t) — 1| admits a bound in terms of Dj. It 
turns out that E|¢,| also admits a bound in terms of D;. Now we will give the 
second version of the proof that actually leads to a stronger variant of the law of 
large numbers. 


Theorem 8.3.2 Under conditions (8.3.1) and [D,] one has E|¢,| > 0 (ie. 


Ek, 


'The second version was communicated to us by A.I. Sakhanenko. 


>There exists an alternative “direct” proof of Theorem 8.3.1 using not ch.f.s but the so-called 
truncated random variables and estimates of their variances. However, because of what follows, it 
is more convenient for us to use here the machinery of ch.f.s. 


192 8 Sequences of Independent Random Variables. Limit Theorems 


The assertion of Theorem 8.3.2 clearly means the uniform integrability of {fn}; 
it implies Theorem 8.3.1, for 


P(Ién| > €) <E|f;|/e > 0 asn— ©. 


Proof of Theorem 8.3.2 Put 


EL _i= : 
eh 0 otherwise, 


fa if |En| <1, 


and &7" , = in — eat Then & n = & + Se and ¢, = ¢/ + ¢/’ with an obvious 


convention for the notations ¢/, ¢/’. By the Cauchy—Bunjakovsky inequality, 


Elén| < E|g, — Es), | +E|¢/” — Bgy"| < VE(¢, —Ec/)” + E|¢/”| + |E/| 


Sf >0 Var(Fin) +2 El nl Sf DEG in) +22 Ele 
=[D> BG? Ben <))] 
+2) E(lénls En > 1) < Di +2D1 > 0, 


if D; — 0. The theorem is proved. 


Remark 8.3.1 It can be seen from the proof of Theorem 8.3.2 that the argument will 
remain valid if we replace the independence of &%,, by the weaker condition that 
é k. , are non-correlated. It will also be valid if & k. , are only weakly correlated so that 


EC. = E¢))” < c >= Var(&n): C< OO. 


If {&} is a given fixed (not dependent on n) sequence of independent random 
variables, S$, = aan & and E&; = ax, then one looks at the applicability of the law 
of large numbers to the sequences 


Ek — ak 1 ” 
n= ’ n= = > Sn = ; 8.3.7 
Te ee ee Co 


where &; ,, satisfy (8.3.1), and b(n) is an unboundedly increasing sequence. In some 
cases it is natural to take b(n) = )~;_, E|&| if this sum increases unboundedly. 
Without loss of generality we can set a, = 0. The next assertion follows from The- 
orem 8.3.2. 


Corollary 8.3.1 [f, asin —> o, 


1 
Di = Fy DU Emin( él, E?/b(n)) > 0 


8.3. The Law of Large Numbers for Arbitrary Independent Random Variables 193 


or, for any tT > 0, 


M(t) = a5 E (ll; || > th(n)) > 0, b(n) = YS EI&| > oo, (8.3.8) 
k=1 


1 
then fy aie 0. 


Now we will present an important sufficient condition for the law of large num- 
bers that is very close to condition (8.3.8) and which explains to some extent its 
essence. In addition, in many cases this condition is easier to check. Let by = E|&x|, 
bn = max;<, bx, and, as before, 


n n 
Sn= ok, b(n) =D be. 
k=1 k=1 


The following assertion is a direct generalisation of Theorem 8.1.1 and Corol- 
lary 8.1.1. 


Theorem 8.3.3 Let E& = 0, the sequence of normalised random variables & /by 
be uniformly integrable and by, = o(b(n)) as n > oo. Then 


oe 
b(n) 


If by <b <0 then b(n) < bn and & © 0, 


Proof Since 


Ex |, | &k b(n) 
E(l&l3 |&| > thn) < ne (|#| >t (8.3.9) 
"| De bn 
and 2. —, 90, the uniform integrability of {ze a implies that the right-hand side 


of (8.3.9) i is o(bg) uniformly in k (i.e. it admits a bowed é(n)bz, where é(n) > 0 as 
n — oo and does not depend on k). Therefore 


1 n 
M(t) = 2 (léel || > th(n)) > 0 


as n — oo, and condition (8.3.8) is met. The theorem is proved. 


Remark 8.3.2 If, in the context of the law of large numbers, we are interested in 
convergence in probability, only then can we generalise Theorem 8.3.3. In particular, 
convergence 
Sh Pp 
> 
b(n) 


194 8 Sequences of Independent Random Variables. Limit Theorems 


will still hold if a finite number of the summands &, (e.g., for k < 1, | being fixed) 
are completely arbitrary (they can even fail to have expectations) and the sequence 


Ef = &41, k = I, satisfies the conditions of Theorem 8.3.3, where b(n) is defined for 


the variables &* and has the property a >lasnow. 


This assertion follows from the fact that 


Sn St. Sa— Sp Bn-D Se b(n —1) 
b@) ba) ba=D ba)” ba” b(n) 


and by Theorem 8.3.3 


Sn — St 
b(n —1) 


Pp 
— 0 asn-w. 


Now we will show that the uniform integrability condition in Theorem 8.3.3 


(as well as condition Mj (t) — 0) is essential for convergence ¢, . 0. Consider a 
sequence of random variables 


2’ — 1 with probability 2~°, 


Si=)_1 with probability 1— 275 


for j € Is := (25-1 25] 5= 1,2,...;& =0. Then E&; = 0, E/&;| = 2(1 — 2“) for 
J €Is, and, for n = Qk one has 


k 


b(n) = 95 2(1- 2) Idsl, 


s=l 
where |/;| = 2° — 25—1 = 25! is the number of points in J;. Hence, as k > o, 


b(n) Ag 2[ (1 _ oe 4 (1 _ ae) ites dvs | 


~ Oey oK-1 a okt] = On. 


Observe that the uniform integrability condition is clearly not met here. The distri- 
bution of the number v“) of jumps of magnitude 2° — 1 on the interval J, converges, 
as s — oo, to the Poisson distribution with parameter 1/2 = lims—.o9 2~*|Js|, while 
the distribution of 2~* (S2s — Sj;-1) converges to the distribution of v — 1/2, where 
v € I) /2. Hence, assuming that n = 2*, and partitioning the segment [2, n] into the 
intervals (2° = 2°], s=1,...,k, we obtain that the distribution of S,/n converges, 
as k — ov, to the distribution of 


s a ae ee a 
=2*y* ar aa = i -1/2271 = 4, 


n 
s=l 1=0 


8.3. The Law of Large Numbers for Arbitrary Independent Random Variables 195 


where v;, / =0,1,..., are independent copies of v. Clearly, ¢ 4 0, and so conver- 


gence Ss —> 0 fails to take place. 
Let us return to arbitrary &; ,. In order for [Dj] to hold it suffices that the follow- 


ing condition is met: for some s,2>s > 1, 


S > El& nl > 0. [Ls] 


k=1 


This assertion is evident, since g(x) < |x|* for 2 >s > 1. Conditions [Ls] could 
be called the modified Lyapunov conditions (cf. the Lyapunov condition [Ls] in 
Sect. 8.4). 

To prove Theorem 8.3.2, we used the so-called “truncated versions’ <a of the 
random variables &% ,. Now we will consider yet another variant of the law of large 
numbers, in which conditions are expressed in terms of truncated random variables. 

Denote by €? the result of truncation of the random variable é at level N: 


é™) — max[—N, min(N, €)]. 


Theorem 8.3.4 Let the sequence of random variables {&,} in (8.3.7) satisfy the 
following condition: for any given € > 0, there exist Nx such that 


ly (Ne) 
Diny 2 Elbe — & |<e, Way DMN <0 


1 
Then the sequence {¢,} converges to zero in mean: Cy Bele 0. 


Proof Clearly a0 = = Re) — az as Nz > oo and jae < N;. Further, we 
have 


_ il ! (Nk) 
Elénl = 5B] Ge — a0)| < poy DEI — | 
Ee af? 


Here the second term on the right-hand side converges to zero, since the sum under 
the expectation satisfies the conditions of Theorem 8.3.1 and is bounded. But the 
first and the last terms do not exceed e. Since the left-hand side does not depend on 
é, we have E|¢,| > 0 asn > oo. 


Corollary 8.3.2 [f b(n) =n and, for sufficiently large N and all k <n, 


Eg, — | <, 


then fy = 0. 


196 8 Sequences of Independent Random Variables. Limit Theorems 


The corollary follows from Theorem 8.3.4, since the conditions of the corollary 
clearly imply the conditions of Theorem 8.3.4. 

It is obvious that, for identically distributed &;, the conditions of Corollary 8.3.2 
are always met, and we again obtain a generalisation of Theorem 8.1.1 and Corol- 
lary 8.1.1. 

If E|&|" < co for r > 1, then we can also establish in a similar way that 


Remark 8.3.3 Condition [D;] (or [Mj]) is not necessary for convergence Cy, 40 
even when (8.3.2) and (8.3.5) hold, as the following example demonstrates. Let &% 
assume the values —n, 0, and n with probabilities 1/n?, 1- 2/n?, and 1/n?, re- 


spectively. Here Cy, ze 0, since P(t, 40) < P(U {én 4 0}) < 2/n > 0, Elé& nl = 
2/n > 0 and M; = DO E|é&,n| = 2 < oo. At the same time, > E(\&.n|3 |En| = 
1) =2 4 o, so that conditions [D;] and [Mj] are not satisfied. 

However, if we require that 


Ekin = —€Ek.ns Ek.n = 0, 


n (8.3.10) 
max €,n — 0, ee <c<M, 
k<n kel 


then condition [D;] will become necessary for convergence ¢,, - G. 
Before proving that assertion we will establish several auxiliary relations that 
will be useful in the sequel. As above, put A; (tf) := @,n(t) — 1. 


Lemma 8.3.2 One has 


So |Ac(0)| < (tl. 


k=1 


If condition [S] holds, then for each t, as n + oo, 


max| A; (f)| —> 0. 
k<n 


Ifa random variable & with EE = 0 is bounded from the left: § > —c, c > 0, then 
E\é| < 2c. 


Proof By Lemma 7.4.1, 
|Ax(e)| < Ble" — 1] < |r| El&enl, DoJ Ae@®| Sle. 


Further, 


8.3. The Law of Large Numbers for Arbitrary Independent Random Variables 197 


| Ac(t)| < E(\el" — 1]: Gen Se) + E([el" — 1 


> 1Eknl > é) 
< |tle + 2P(l&.n| >). 
Since ¢ is arbitrary here, the second assertion of the lemma now follows from con- 


dition [S]. 
Put 


Et :=max(0;&)>0, § =—(&-&T)>0. 


Then Eé = E&+ — E&~ = 0 and E|é| = E&* + E&~ = 2EE~ < 2c. The lemma is 
proved. 


From the last assertion of the lemma it follows that (8.3.10) implies (8.3.2) and 
(8.3.5). 


Lemma 8.3.3 Let conditions [S] and (8.3.2) be satisfied. A necessary and sufficient 
condition for convergence @;,(t) > @(t) is that 


So Ag(t) > Ing(). 


k=1 
Proof Observe that 
Re Ag (t) = Re(ven(t)—1) <0, |e**| <1, 
and therefore, by Lemma 7.4.2, 
n n 
Gz, (t) — e& | =| TT gen — [] e*® 


k=1 k=1 


n n 
< Y lpn (t) — | = Se —1- AR] 


k=1 k=1 


1 n 1 n 
oy 2A) < 5 max| Ax (| 2/400. 


By Lemma 8.3.2 and conditions [S] and (8.3.2), the expression on the left-hand side 
converges to 0 as n — oo. Therefore, if g, (t) > y(t) then exp{}> Ax (1)} > 9), 
and vice versa. The lemma is proved. 


The next assertion complements Theorem 8.3.1. 


198 8 Sequences of Independent Random Variables. Limit Theorems 


Theorem 8.3.5 Assume that relations (8.3.1) and (8.3.10) hold. Then condition 
[D1] (or condition [M)]) is necessary for the law of large numbers. 


Proof If the law of large numbers holds then @,, (t) — 1 and, hence by Lemma 8.3.3 
(recall that (8.3.10) implies (8.3.2), (8.3.5) and [S]) 


n n 
bs A,(t) = De Ca — 1 -ité&n») > 0. 
k=1 k=1 


Moreover, by Lemma 7.4.1 


n 
DIE (|e == it&kn ; lEk.n| = Ek,n) 
k=1 
1" 7 
<s E(|xiz ,|7: < < 2 < 0 
=5 ( deca | lEenl < Six) = Ekin S os Ek,n — Vz 
k=1 k=1 k=1 


Therefore, if the law of large numbers holds, then by virtue of (8.3.10) 


n 
YE (e!S" — 1 — it&.ni Een > &,n) > 0. 
k=1 


Consider the function w(x) = (e’* — 1) /ix. Itis not hard to see that the inequality 
|a(x)| < 1 proved in Lemma 7.4.1 is strict for x > ¢ > 0, and hence there exists a 
6(t) > 0 for t > O such that Re(1 — a(x)) => 6(t) for x > t. This is equivalent to 
Im(1 + ix — e*) > 8(r)x, so that 


1 . 
x < —~Im(1+ ix -e'*) for x >T. 


= 8(t) 


From this we find that 


Ei(t) = 9 E(l&nl; [enl > t) = E(u Sn > 7) 
k=1 k=1 


1 ' 
< Im) *E(1 + ifkn — 8"; En > Ek) > 0. 
6(T) A 


Thus condition [M,] holds. Together with relation (8.3.2), that follows from 
(8.3.10), this condition implies [D;]. The theorem is proved. 


8.4 The Central Limit Theorem 199 


There seem to exist some conditions that are wider than (8.3.10) and under which 


condition [D;] is necessary for convergence ¢, a 0 in mean (condition (8.3.10) is 
too restrictive). 


8.4 The Central Limit Theorem for Sums of Arbitrary 
Independent Random Variables 


As in Sect. 8.3, we consider here a triangular array of random variables &1.n,..., €n,n 
and their sums 


on = >> fk. (8.4.1) 
al 


We will assume that &; , have finite second moments: 
2 
Of, = Var(&k.n) < 00, 


and suppose, without loss of generality, that 


E&n=0,  ) og, = Varn) = 1. (8.4.2) 
k=1 


We introduce the following condition: for some s > 2, 


n 
Dy := )Emin(€.,,, l&nl°) + 0 asn— ov, [D2] 
k=1 
which is to play an important role in what follows. Our arguments related to condi- 
tion [D2] and also to conditions [M2] and [Ls] to be introduced below will be quite 
similar to the ones from Sect. 8.3 that were related to conditions [D;], [Mj] and 
[Ls]. 
We also introduce the Lindeberg condition: for any t > 0, asn > o, 


Mp(t) := 9) E(Ik,nl? |&k,nl > T) > 0. [Mo] 
k=1 


The following assertion is an analogue of Lemma 8.3.1. 
Lemma 8.4.1 1. {[M2]/M (4.2)} C [D2]. 2. [D2] Cc [Mp]. 


That is, conditions [M>] and (8.4.2) imply [D2], and condition [D2] implies 
[Mp]. 

From Lemma 8.4.1 it follows that, under condition (8.4.2), conditions [D2] and 
[M>] are equivalent. 


200 8 Sequences of Independent Random Variables. Limit Theorems 


Proof of Lemma 8.4.1 1. Let conditions [M] and (8.4.2) be met. Put 
g2(x) = min(x*, |x|*), 5 > 2. 


Then (cf. (8.3.3), (8.3.4); t < 1) 


n 


Dy = YE go(E.n) < YE(EE ni lEk.nl > T) + YE (le nls len! ST) 


k=1 k=1 k=1 


< Mp(t) + 1°? M2(0) = Mo(t) + 1°7?. 


Since T is arbitrary, we have D2 > 0 asn > oo. 
2. Conversely, suppose that [D2] holds. Then 


= 1 = 1 
M2(t) < 2 BlEE Ifnl > +o D{(Bbal* t <lénl <1) <P> 0 


for any t > 0, as n > oo. The lemma is proved. 


Lemma 8.4.1 also implies that if (8.4.2) holds, then condition [D2] is “invariant” 
with respect to s > 2. 
Condition [D2] can be stated in a more general form: 


n 
YS EEE ht (I&,nl) > 0, 
k=1 


where h(x) is any function for which h(x) > 0 for x > 0, h(x) t, h(x) > 0 as 
x — 0, and h(x) > c < was x > ov. All the key properties of condition [D2] will 
then be preserved. The Lindeberg condition clarifies the meaning of condition [D>] 
from a somewhat different point of view. In Lindeberg’s condition, h(x) = I(z,o0), 
t € (0, 1). A similar remark may be made with regard to conditions [D;] and [M;] 
in Sect. 8.3. 

In a way similar to what we did in Sect. 8.3 when discussing condition [Mj], one 
can easily verify that condition [M2] implies convergence (see (8.3.6)) 


max Var(&&n) > 0 (8.4.3) 
k<n 


and the negligibility of & , (property [S]). Moreover, one obviously has the inequal- 
ity 


1 
Mi) = —Ma(t). 


For a given fixed (independent of n) sequence {&} of independent random vari- 
ables, 


fo) 
Sn= of, E&=ay, — Var(&) = of, (8.4.4) 
k=1 


8.4 The Central Limit Theorem 201 


one considers the asymptotic behaviour of the normed sums 


1 [o,@) [o.@) 
b= (s. - Sa), B=) a; (8.4.5) 
n k=1 k=1 


that are clearly also of the form (8.4.1) with & » = (&& — ax)/Bn. 
Conditions [D;] and [Mo] for &; will take the form 


1 l& — axl’ 
= F 2 : 
D = y Emin( ag)”, Be ) >0, s>2; 


is (8.4.6) 
Ma(t) = =) E((Ee — a4): |& — x1 > TB) +0, >. 


Theorem 8.4.1 (The Central Limit Theorem) /f the sequences of random vari- 
ables {Exner n=1,2,..., satisfy conditions (8.4.2) and [D2] (or [Ma]) then, as 
n—> ©, P(fn <x) > ®(x) uniformly in x. 


Proof It suffices to verify that 


[e,e) 


aad 
v,(01)=| [Gen > et”. 
k=1 


By Lemma 7.4.2, 


|ee,() —eP?| = 


n n 

422 
I] Qk n(t) — I] e* Phan! 
k=1 


k=1 


n 
33 
< Do [gen — ef n!?| 
k=1 


3 


k=1 


n 
Fe 
k=1 


1 
Cat I 5! Shan 


Ding 2 
et O% y/2 


1 
ee 5! Sn : (8.4.7) 


Since by Lemma 7.4.1, for s < 3, 


2 

‘ X 
nd | 4+ — 
e UX 5) 


eon 2 |x3| 
a al ae < g2(x) 


202 8 Sequences of Independent Random Variables. Limit Theorems 


(see the definition of the function g2 in the beginning of the proof of Lemma 8.4.1), 
the first sum on the right-hand side of (8.4.7) does not exceed 


a 1 
> B(<ite =)ithey + 542.) 
k=1 
CO foe) 
< OE go(It&&nl) < AO D> Ego(léenl) <h@) D2 > 0, 
k=1 k=1 


where h(t) = max(t?, it|?). The last sum in (8.4.7) (again by Lemma 7.4.1) does 
not exceed (see (8.4.2) and (8.4.3)) 


n 
tot Onn So 8 Bmp Loakn Sg meee 0 asn— oO. 


The theorem is proved. 


If we change the second relation in (8.4.2) to Et, > ao” > 0, then, introducing 
the new random variables & ,, = &&n/+/Var¢, and using continuity theorems, it is 
not hard to obtain from Theorem 8.4.1 (see e.g. Lemma 6.2.2), the following asser- 
tion, which sometimes proves to be more useful in applications than Theorem 8.4.1. 


Corollary 8.4.1 Assume that E&;, , = 0, Var(é,) > o* > 0, and condition [D2] (or 
[M>]) is satisfied. Then &) & ®0,o. 


Remark 8.4.1 A sufficient condition for [D2] and [M2] is provided by the more re- 


strictive Lyapunov condition, the verification of which is sometimes easier. Assume 
that (8.4.2) holds. For s > 2, the quantity 


n 
Ls := ) Een 
k=1 


is called the Lyapunov fraction of the s-th order. The condition 
L; ~0 asn—>oo [Ls] 


is called the Lyapunov condition. 


The quantity Ly is called a fraction since for &&.» = (& —a)/ By (where a, = Eéx, 
BR? -_ ei Var(&;,) and € do not depend on 7), it has the form 


1 
Lo 4 Bg — ag? 


nN k=] 


8.4 The Central Limit Theorem 203 


If the & are identically distributed, a, = a, Var(&) = 07, and E\& — al’ =p < 00, 
then 
iu 
Lo= Gin@aa/e > 0. 

The sufficiency of the Lyapunov condition follows from the obvious inequalities 
g2(x) < |x|* for any s, D2 < Ls. 

In the case of (8.4.4) and (8.4.5) we can give a sufficient condition for the in- 
tegral limit theorem that is very close to the Lindeberg condition [M2]; the former 
condition elucidates to some extent the essence of the latter (cf. Theorem 8.3.3), and 
in many cases it is easier to verify. Put 0, = maxy<, og. Theorem 8.4.1 implies the 
following assertion which is a direct extension of Theorem 8.2.1 


Theorem 8.4.2 Let conditions (8.4.4) and (8.4.5) be satisfied, the sequence of 
normalised random variables &2 /o; be uniformly integrable and O, = o(B,) as 
n— oo. Then bt, & 01. 


Proof of Theorem 8.4.2 repeats, to some extent, the proof of Theorem 8.3.3. For 
simplicity assume that a, = 0. Then 


Be 


of 


gk 


Ok 


By 
>T =), (8.4.8) 


On 


E(é: lEx| > TBn) < = ope ( “5; 


where B,,/O, — oo. Hence, it follows from the uniform integrability of es 5) that 


the right-hand side of (8.4.8) is o(o;) uniformly in k. This means that 


Mp(t) = + EE: |&c| > TBn) > 0 


Baa 


as n — oo and condition (8.4.6) (or condition [M>]) is satisfied. The theorem is 
proved. 


Remark 8.4.2 We can generalise the assertion of Theorem 8.4.2 (cf. Remark 8.3.3). 
In particular, convergence 6, & ®o 1 still takes place if a finite number of summands 
& (e.g., for k <1, | being fixed) are completely arbitrary, and the sequence &; := 


i k>1, oe the conditions aa 8.4.2, in which we put ae = Var(&;), 


Put 


=) 1 a; , and it is also assumed that >lasn>-ow. 


This assertion follows from the fact that 


Sn Si Sn — Si . Bn-i 


Bn Bn Bh-i Bn ; 


where 3 40, Srl > | and, by Theorem 8.4.2, Sa 53-5. €> @o 1 as n> 00. 


204 8 Sequences of Independent Random Variables. Limit Theorems 


Remark 8.4.3 The uniform integrability condition that was used in Theorem 8.4.2 
can be used for the triangular array scheme as well. In this more general case the 
uniform integrability should mean the following: the sequences 11 .n,---. nn» = 
1,2,..., in the triangular array scheme are uniformly integrable if there exists a 
function e(N) | 0 as N t¢ co such that, for all n, 


max E(|nj.nl: Injn| > N) <e(N). 
j<n 


It is not hard to see that, with such an interpretation of uniform integrability, 
the assertion of Theorem 8.4.2 holds true for the triangular array scheme as well 
5 


jnmys . 7 
2 } is uniformly integrable and max j<y Oj, = 0(1) as 
jn 


provided that the sequence { 


n> wo. 


Example 8.4.1 We will clarify the difference between the Lindeberg condition and 


2 
uniform integrability of (5) in the following example. Let nz be independent 
k 


bounded identically distributed random variables, En, = 0, Dn, = 1 and g(k) > J/2 
be an arbitrary function. Put 


nk with probability 1 — 2¢~7(k), 
: tg(k) with probability g~(k). 


Then clearly E& = 0, of := D& = 3 — 2g~7(k) € (2,3) and B? € (2n, 3n). The 
uniform integrability of {5}, or the uniform integrability of {E2} which means the 


same in our case, excludes the case where g(k) — oo as k — oo. The Lindeberg 
condition is wider and allows the growth of g(k), except for the case where g(k) > 
ck. If g(k) = o(/k), then the Lindeberg condition is satisfied because, for any 
fixed t > 0, 


E(&?: |&| > tk) =0 
for all large enough k. 
Remark 8.4.4 Let us show that condition [M>] (or [D2]) is essential for the central 
limit theorem. Consider random variables 
ion 


ee ee 
kn = ; ie 
aha 0 with probability 1 — 2. 


with probability +, 


They satisfy conditions (8.4.2), [S], but not the Lindeberg condition as Mz(t) = 1 
for T < wok The number vz of non-zero summands converges in distribution to 
a random variable v having the Poisson distribution with parameter 2. Therefore, ¢, 
will clearly converge in distribution not to the normal law, but to pee yj, where 
y; are independent and take values +1 with probability 1/2. 


8.4 The Central Limit Theorem 205 


Note also that conditions [D2] or [M>] are not necessary for convergence of 
the distributions of ¢, to the normal distribution. Indeed, consider the following 
example: €1,, € 01, 2.1 =--: = &n.n = 0. Conditions (8.4.2) are clearly met, 
P(f, < x) = ®(x), but the variables &, are not negligible and therefore do not 
satisfy conditions [D2] and [Mp]. 

If, however, as well as convergence ¢, &> ®o 1 we require that the &; , are neg- 
ligible, then conditions [D2] and [M2] become necessary. 


Theorem 8.4.3 Suppose that the sequences of independent random variables 
{Ex.nhpey satisfy conditions (8.4.2) and [S]. Then condition [D,] (or [M2]) is neces- 
sary and sufficient for convergence 6, & ®o 1. 


First note that the assertions of Lemmas 8.3.2 and 8.3.3 remain true, up to some 
inessential modifications, if we substitute conditions (8.3.2) and [S] with (8.4.2) 
and [S]. 


Lemma 8.4.2 Let conditions (8.4.2) and [S] hold. Then (Ax (t) = ¢k.n(t) — 1) 


;2 
max| A; (| = 0, SY |Ac| < > 


and the assertion of Lemma 8.3.3, that the convergence (8.3.10) is necessary and 
sufficient for convergence @, (t) > g(t), remain completely true. 


Proof We can retain all the arguments in the proofs of Lemmas 8.3.2 and 8.3.3 
except for one place where }*|A,(t)| is bounded. Under the new conditions, by 
Lemma 7.4.1, we have 


2 
|Ac()| = |Yesn(t) — 1 — itE&e,n| < Elei8" — 1 — itn] < SEE. 


so that 


ps 
y >| Ax()| < > 


No other changes in the proofs of Lemmas 8.3.2 and 8.3.3 are needed. 


Proof of Theorem 8.4.3 Sufficiency is already proved. To prove necessity, we make 


use of Lemma 8.4.1. If g(t) > ent /2, then by virtue of that lemma, for A; (t) = 
Gk.n(t) — 1, one has 
n 12 
x A(t) > Ing(t) = = 


k=1 


For t = 1 the above relation can be written in the form 


n 
; : 1 
Ry i= rele 1 —ikjnt stn) > 0. (8.4.9) 


206 8 Sequences of Independent Random Variables. Limit Theorems 


Put a(x) := (e* —1— Date ae It is not hard to see that the inequality |a(x)| < 1/2 
proved in Lemma 7.4.1 is strict for x 4 0, and 


sup |a(x)| 5 ~—<d(t), 
|x|=t 


where 6(t) > 0 for t > 0. This means that, for |x| > t > 0, 


1 2 ix . x? 
Re a(x) + 5 > d(t) > 0,7 x“ < ——Re|[e aa: ; 


E 
E(&.,; lEk.n| > t) < sa Ree (ett —l- itn + 
and hence by virtue of (8.4.9), for any t > 0, 


1 
M2(t) < soy y LRn n| > 0 


as n — oo. The theorem is proved. 


Corollary 8.4.2 Assume that (8.4.2) holds and 


max Var(&) > 0. (8.4.10) 
k<n 


Then a necessary and sufficient condition for convergence 6, &> ®o,1 is that 


n 
Nn = Y fen et 


k=1 
(or that ny 4 1). 


Proof Let n, & I,. The random variables Ee r= =5, 7 — of , Satisfy, by virtue of 
(8.4.10), condition (8.3.10) and satisfy the law of large numbers: 


Therefore, by Theorem 8.3.5, the oe satisfy condition [Mj]: for any t > 0, 


—o;,|>7)>0. (8.4.11) 


n 
» E(|gin ~ OF > 
k=1 


But by (8.4.10) this condition is clearly equivalent to condition [M2] for & ,, and 
hence ¢, & ®o 1. 


8.4 The Central Limit Theorem 207 


Conversely, if ¢, => ®o,1, then [Mo] holds for & , which implies (8.4.11). Since, 
moreover, 


n n 
DOE Kul $2. Varn) =2, 
k=1 k=1 


relation (8.3.2) holds for oh hs and by Theorem 8.3.1 


n 
Sof en =n - 140. 


k=1 


The corollary is proved. 


Example 8.4.2 Let &&,k =1,2,..., be independent random variables with distribu- 
tions 


1 
P(E = k*) = P(E = —k*) => 
Evidently, & can be represented as & = k* nx, where nx Z n are independent, 


1 
mime aS 


Var(n) = 1, of = Var(Ey) =k. 


Let us show that, for all a > —1/2, the random variables S,,/B, are asymptoti- 
cally normal. Since 


2 
gi d 2 
“=n 
OK 


are uniformly integrable, by Theorem 8.4.2 it suffices to verify the condition 


On =maxo, = O(B,). 
k<n 


In our case o,, = max(1, n2@) and, for a > —1/2, 


n 
n2erl 


n 
B? = ) Kem ~| x da = : 
kel 0 2a + 1 


For a = —1/2, one has 


n 
=> ine, 
k=1 


Clearly, in these cases o,, = 0(B,) and the asymptotical normality of S;,/n holds. 
If a < —1/2 then the sequence B, converges, condition 0, = 1 = o(B,) is not 
satisfied and the asymptotical normality of S,,/B, fails to take place. 


208 8 Sequences of Independent Random Variables. Limit Theorems 


Note that, for a = —1/2, the random variable 


converge to a constant. 


A rather graphical and well-known illustration of the above theorems is the scat- 
tering of shells when shooting at a target. The fact is that the trajectory of a shell is 
influenced by a large number of independent factors of which the individual effects 
are small. These are deviations in the amount of gun powder, in the weight and size 
of a shell, variations in the humidity and temperature of the air, wind direction and 
velocities at different altitudes and so on. As a result, the deviation of a shell from 
the aiming point is described by the normal law with an amazing accuracy. 

Similar observations could be made about errors in measurements when their 
accuracy is affected by many “small” factors. (There even exists a theory of errors 
of which the crucial element is the central limit theorem.) 

On the whole, the central limit theorem has a lot of applications in various areas. 
This is due to its universality and robustness under small deviations from the as- 
sumptions of the theorem, and its relatively high accuracy even for moderate values 
of n. The first two noted qualities mean that: 

(1) the theorem is applicable to variables &%,, with any distributions so long as 
the variances of &% , exist and are “negligible”; 

(2) the presence of a “moderate” dependence* between &;., does not change the 
normality of the limiting distribution. 

To illustrate the accuracy of the normal approximation, consider the following 
example. Let F,(x) = P(S,/./n < x) be the distribution function of the normalised 
sum S, of independent variables &; uniformly distributed over [—/3, /3], so that 
Var(&) = 1. Then it turns out that already for n = 5 (!) the maximum of | F, (x) — 
@(x)| over the whole axis of x-values does not exceed 0.006 (the maximum is 
attained near the points x = 0.7). 

And still, despite the above circumstances, one has to be careful when applying 
the central limit theorem. For instance, one cannot expect high accuracy from the 
normal approximation when estimating probabilities of rare events, say when study- 
ing large deviation probabilities (this issue has already been discussed in Sect. 5.3). 


3There exist several conditions characterising admissible dependence of &,,. Such considerations 
are beyond the scope of the present book, but can be found in the special literature. See e.g. [20]. 


8.5 Another Approach to Proving Limit Theorems 209 
After all, the theorem only ensures the smallness of the difference 
| (x) — P(E < x)| (8.4.12) 


for large n. Suppose we want to use the normal approximation to find an xq such 
that the event {f, > xo} would occur on average once in 1000 trials (a problem 
of this sort could be encountered by an experimenter who wants to ensure that, in 
a single experiment, such an event will not occur). Even if the difference (8.4.12) 
does not exceed 0.02 (which can be a good approximation) then, using the normal 
approximation, we risk making a serious error. It can turn out, say, that 1 — ®(xo) = 
10-3 while P(¢ < x) © 0.02, and then the event {f,, > x9} will occur much more 
often (on average, once in each 50 trials). 

In Chap. 9 we will consider the problem of large deviation probabilities that 
enables one to handle such situations. In that case one looks for a function P(n, x) 
such that P(¢ < x)/P(n,x) ~ lasn > ow, x > ow. The function P(n, x) turns 
out to be, generally speaking, different from 1 — (x). We should note however that 
using the approximation P(n, x) requires more restrictive conditions on {&& »}. 

In Sect. 8.7 we will consider the so-called integro-local and local limit theorems 
that establish convergence of the density of ¢, to that of the normal law and enables 
one to estimate probabilities of rare events of another sort—say, of the form {a < 
fn < b} where a and D are close to each other. 


8.5" Another Approach to Proving Limit Theorems. Estimating 
Approximation Rates 


The approach to proving the principal limit theorems for the distributions of sums of 
random variables that we considered in Sects. 8.1—8.4 was based on the use of ch.f.s. 
However, this is by far not the only method of proof of such assertions. Nowadays 
there exist several rather simple proofs of both the laws of large numbers and the 
central limit theorem that do not use the apparatus of ch.f.s. (This, however, does not 
belittle that powerful, well-developed, and rather universal tool.) Moreover, these 
proofs sometimes enable one to obtain more general results. As an illustration, we 
will give below a proof of the central limit theorem that extends, in a certain sense, 
Theorems 8.4.1 and 8.4.3 and provides an estimate of the convergence rate (although 
not the best one). 

Along with the random variables |, in the triangular array scheme under as- 
sumption (8.4.2), consider mutually independent and independent of the sequence 
{Ex.nbpey random variables nxn € %) 42 > Ok.n = Var(Ex,n), So that 


n 


Nn = > Nkn S Go 1. 
k=1 


210 8 Sequences of Independent Random Variables. Limit Theorems 


Set* 
Kkn i= Elénl, Vin = Elninl? = C30 py S C3MK,n> 
pe / Ix[3| (Fin (x) — Pen (x))| < ben + Ven, 
n n n 
gee pis Nst= > Viens L$ := oun SL3+N3 < (1 t03)L3. 


k=1 
Here Fx, and ®;_, are the distribution functions of & , and nxn, respectively. The 
quantities L3 and N3 are the third order Lyapunov fractions for the sequences {& ,} 
and {nxn}. The quantities pe , are called the third order pseudomoments and i? 
the Lyapunov fractions for pseudomoments. Clearly, N3 < c3L3 — 0, provided that 
the Lyapunov condition holds. As we have already noted, for &% 7 = (& — ax)/Bn, 
where az = Eéx, B2 = a Var(&,), and & do not depend on n, one has 


L3= “ae be = BE — ax? 
n k=] 


If, moreover, &% are identically distributed, then 


inal 
o3.J/n- 


Our first task here is to estimate the closeness of Ef (¢,) to Ef (n,) for suffi- 
ciently smooth f. This problem could be of independent interest. Assume that f 
belongs to the class C3 of all bounded functions with uniformly continuous and 
bounded third derivatives: sup, | f BDx)| < fr. 


L3= 


Theorem 8.5.1 If f € C3 then 


oe ae re B 


Ef (Gn) — Ef (m)| < (La +N). (8.5.1) 


Proof Put, for 1 </ <n, 
X1 = Ein tee + §l-1n + Min tee + Man, 
Zi = Ein te + &t-t.n + M41n t+ + Mn, 
X1 =n, Xn+1 = Sn- 
Then 
X41 = Zi + Ein, X1= Zi + Nin, (8.5.2) 


4 00 3, —x2/2 4g 4 POO te 
If 7 € ®o , then c3 = E|n|3 = xe dx TE to te'dt 


Ay, 


8.5 Another Approach to Proving Limit Theorems 211 


n 


f Gn) — fmm) = > [Ff Kis) — F(X). (8.5.3) 


l=1 


Now we will make use of the following lemma. 


Lemma 8.5.1 Let f € C3 and Z, & and n be independent random variables with 
Eé = En =a, E&? = Er? =0’," w= / |x? ||d( Fe (x) — Fy(x))| < 00. 


Then 


JEf(Z+£)-Ef(Z+n)|< (8.5.4) 


fu? 
at 
Applying this lemma to (8.5.3), we get 


IE[ f (X40) — f(X)]] < pe 


which after summation gives (8.5.1). The theorem is proved. 


Thus to complete the argument proving Theorem 8.5.1 it remains to prove 
Lemma 8.5.1. 


Proof of Lemma 8.5.1 Set g(x) :=Ef(Z + x). It is evident that g, being the result 


of the averaging of f, has all the smoothness properties of f and, in particular, 
|g’ (x)| < fs. By virtue of the independence of Z, € and n, we have 


Ef(Z+&)—-Ef(Z+n)= [eatrco - F,(x)). (8.5.5) 
For the integrand, we make use of the expansion 


g(x) = g(0) +xg’(0) + <0 +5 a", ), 4 € [0, x]. 


Since the first and second moments of € coincide with those of 7, we obtain for the 
right-hand side of (8.5.5) the bound 


1 0 
es 90x) d(Fe(x) — Fy(x))] < Be 


The lemma is proved. 


Remark 8.5.1 In exactly the same way one can establish the representation 


faly 


JES Cn) —Ef (n)| < 2 OD Ben~ nin) Be (8.5.6) 


212 8 Sequences of Independent Random Variables. Limit Theorems 


under obvious conventions for the notations f4 and Ey This bound can improve 
upon (8.5.1) if the differences E&, _ Nien) are small. If, for instance, &, = 
(& — a)/(o Jn), & are identically distributed, and the third moments of &;, and 
Nk.n coincide, then on the right-hand side of (8.5.6) we will have a quantity of the 
order 1/n. 


Theorem 8.5.1 extends Theorem 8.4.1 in the case when s = 3. The extension 
is that, to establish convergence ¢, &> ®o,1, one no longer needs the negligibility 
of &&n- If, for example, &1,, & ®o,1/2 (in that case i = 0) and i — 0, then 
Ef (Gn) > Ef (n), n € ®o0,1, for any f from the class C3. Since C3 is a distribution 
determining class (see Chap. 6), it remains to make use of Corollary 6.3.2. 

We can strengthen the above assertion. 


Theorem 8.5.2 For any x € R, 
0) 1/4 
|P(Gn <x) — ®(x)| <c(L3)", (8.5.7) 
where c is an absolute constant. 
Proof Take an arbitrary function h € C3, 0 <h < 1, such that h(x) = 1 for x <0 
and h(x) = 0 for x > 1, and put 43 = sup, |h”"(x)|. Then, for the function f(x) = 


h((x — t)/e), we will have f3 = sup, | f’”(x)| < h3/e?, and by Theorem 8.5.1 


fis 


Pn <t) SEfn) SEf(y) + 6 


h3L° h3L9 
= 2Pgxpn+— 4 — = 


683 Jon 663 — 


The last inequality holds since the maximum of the derivative of the normal distri- 
bution function ®(t) = P(n < ft) is equal to 1/./2z. Establishing in the same way 
the converse inequality and putting ¢ = (Ey 4 | we arrive at (8.5.7). The theorem 
is proved. 


<P(n<tt+e)+ 


The bound in Theorem 8.5.2 is, of course, not the best one. And yet inequality 
(8.5.7) shows that we will have a good normal approximation for P(¢, < x) in the 
large deviations range (i.e. for |x| > oo) as well—at least for those x for which 


(1 — ®(|x1))(L3) 4 00 (8.5.8) 


as n — oo. Indeed, in that case, say, for x = |x| > 0, 


P(E, = x) a ey 
1— D(x) ~ 1— D(x) 


8.5 Another Approach to Proving Limit Theorems 213 


Since by L’ Hospital’s rule 


1—- (x)= feta ~ 1p as X > 00 
V2 Jx VJ 20x 
(8.5.8) holds for |x| < cy,/—In iy with an appropriately chosen constant c. 

In Chap. 20 we will obtain an extension of Theorems 8.5.1 and 8.5.2. 

The problem of refinements and approximation rate bounds in the central limit 
theorem and other limit theorems is one of the most important in probability theory, 
because solving it will tell us how precise and efficient the applications of these 
theorems to practical problems will be. First of all, one has to find the true order of 
the decay of 


An = sup|P (Cn <x)- P(x)| 


in n (or, say, in L3 in the case of non-identically distributed variables). There ex- 
ist at least two approaches to finding sharp bounds for A,. The first one, the so- 
called method of characteristic functions, is based on the unimprovable bound for 
the closeness of the ch.f.s 


2 


t 
In @, (t) + oy <cL3 


that the reader can obtain by him/herself, using Lemma 7.4.1 and somewhat modify- 
ing the argument in the proof of Theorem 8.4.1. The principal technical difficulties 
here are in deriving, using the inversion formula, the same order of smallness for Ay. 

The second approach, the so-called method of compositions, has been illustrated 
in the present section in Theorem 8.5.1 (the idea of the method is expressed, to a 
certain extent, by relation (8.5.3)). It will be using just that method that we will 
prove in Appendix 5 the following general result (Cramér—Berry—Esseen): 


Theorem 8.5.3 Jf && nn = (&& — ax)/Bn, where & do not depend on n, then 


sup|P(fn <x) — B(x)| <eLs, 
x 
where c is an absolute constant. 


In the case of identically distributed &; the right-hand side of the above inequality 
becomes cy /(o>Jn). It was established that in this case (27)~!/* < ¢ < 0.4774, 
while in the case of non-identically distributed summands c < 0.5591.° 

One should keep in mind that the above theorems and the bounds for the constant 
c are universal and therefore hold under the most unfavourable conditions (from 
the point of view of the approximation). In real problems, the convergence rate is 
usually much better. 


5See [33]. 


214 8 Sequences of Independent Random Variables. Limit Theorems 


8.6 The Law of Large Numbers and the Central Limit Theorem 
in the Multivariate Case 


In this section we assume that §),,, ... ,&),, are random vectors in the triangular 
array scheme, 


n 
Eéq 1 = 0, on = > Ekn- 
k=1 


The law of large numbers ¢,, +. 0 follows immediately from Theorem 8.3.1, if 
we assume that the components of &,, satisfy the conditions of that theorem. Thus 
we can assume that Theorem 8.3.1 was formulated and proved for vectors. 

Dealing with the central limit theorem is somewhat more complicated. Here we 
will assume that Elé&.n|? < 00, where |x|? = (x, x) is square of the norm of x. Let 


n 
2 .—Rel Bigs > 2 
On = EB ,&kn, On, = OK n 

k=1 


(the superscript T denotes transposition, so that Ed , 18 a column vector). 
Introduce the condition 


n 
DLEmin(lé nl”. Gen?) > 0, 8 > 2, [D2] 
k=1 


and the Lindeberg condition 


YE (lfk,nls l&k,n] > 7) > 0 [Mp] 


k=1 


as n — oo for any t > 0. As in the univariate case, we can easily verify that condi- 
tions [D2] and [M)] are equivalent provided that tr a? — ae 1 (02) jj <C<OO. 


Theorem 8.6.1 /f 07 > a7, where o* 


[D2] (or [Mo]) is met, then 


is a positive definite matrix, and condition 


bn => 90,2. 


Corollary 8.6.1 (“The conventional” central limit theorem) Jf &, 2, ... is a se- 
quence of independent identically distributed random vectors, E&, = 0, 0% = 
Bel & and Sn =); && then, as n > ov, 


—= & ®o 2 . 


Ja 


8.6 The Law of Large Numbers 215 


This assertion is a consequence of Theorem 8.6.1, since the random variables 
Ex.n = &//n satisfy its conditions. 


Proof of Theorem &.6.1 Consider the characteristic functions 


n 
Gin) = EEO, gy (t) = Ee = TT] gen). 
k=1 


In order to prove the theorem we have to verify that, for any t, as n > 00, 
1 or 
Qn (t) > exp =e t’ f. 
We make use of Theorem 8.4.1. We can interpret g,.,(t) and @, (t) as the ch.f.s 
Gv) =E exp(ivé; ,), g°(v) = Eexp(ivg,) 


of the random variables &? , = (.n, 0), 7 = (Gn, 9), where 6 = t/|t|, v = |t|. Let 


us show that the scalar random variables € : , satisfy the conditions of Theorem 8.4.1 
(or Corollary 8.4.1) for the univariate case. Clearly, 


n n 

2 

Eé/,, =0, YE (Een) = > E(En, 0)? = 00,07 > 00767 > 0. 
k=1 al 


That condition [D2] is satisfied follows from the obvious inequalities 
n n 
2 
(ns) = Ent <léenl?, DE g0(&,) < D> Eg2(lénl), 
k=1 k=1 


where g2(x) = min(x?, |x|*), s > 2. Thus, for any v and 6 (i.e., for any t), by Corol- 
lary 8.4.1 of Theorem 8.4.1 


1 1 
Pn (t) = Eexp{ivg?} > exp| —30°6070" | a exp| 5101 |. 


The theorem is proved. 


Theorem 8.6.1 does not cover the case where the entries of the matrix 0,7 grow 
unboundedly or behave in such away that the rank of the limiting matrix 0” becomes 
less than the dimension of the vectors &;,,. This can happen when the variances of 
different components of &%, have different orders of decay (or growth). In such a 
case, one should consider the transformed sums ¢/ = ¢,0,'! instead of ¢,. Theo- 
rem 8.6.1 is actually a consequence of the following more general assertion which, 
in turn, follows from Theorem 8.6.1. 


Theorem 8.6.2 If the random variables & , = fxd, satisfy condition [D2] (or 
[M2]) then 6) & ®o £, where E is the identity matrix. 


216 8 Sequences of Independent Random Variables. Limit Theorems 


8.7 Integro-Local and Local Limit Theorems for Sums of 
Identically Distributed Random Variables with Finite 
Variance 


Theorem 8.2.1 from Sect. 8.2 is called the integral limit theorem. To understand 
the reasons for using such a name, one should compare this assertion with (more 
accurate) limit theorems of another type, that describe the asymptotic behaviour of 
the densities of the distributions of S, (if any) or the asymptotics of the probabilities 
of sums S;, hitting a fixed interval. It is natural to call the theorems for densities local 
theorems. Theorems similar to Theorem 8.2.1 can be obtained from the local ones 
(if the densities exist) by integrating, and it is natural to call them integral theorems. 
Assertions about the asymptotics of the probabilities of S,, hitting an interval are 
“intermediate” between the local and integral theorems, and it is natural to call them 
integro-local theorems. In the literature, such statements are often also referred to 
as local, apparently because they describe the probability of the localisation of the 
sum S, in a given interval. 


8.7.1 Integro-Local Theorems 


Integro-local theorems describe the asymptotics of 
P(S; e[x,x+ A)) 


as n — oo for a fixed A > 0. Probabilities of this type for increasing A (or for 
A = oo) can clearly be obtained by summing the corresponding probabilities for 
fixed A. 

We will derive integro-local and local theorems with the inversion formulas from 
Sect. 8.7.2. 

For the sake of brevity, put 


A[x) =[x,x + A) 


and denote by ¢(x) = ¢o,1 (x) the density of the standard normal distribution. Below 
we will restrict ourselves to the investigation of the sums S, = & +---+&, of 


independent identically distributed random variables &; £ é. 


Theorem 8.7.1 (The Stone—Shepp integro-local theorem) Let & be a non-lattice 
random variable, EE = 0 and E&* = 0% < oo. Then, for any fixed A > 0, as 
n> ow, 


P(S, € A[x)) = _ o(_) +0o(=), (8.7.1) 


where the remainder term o(1/,/n) is uniform in x. 


8.7 Limit Theorems for Sums of Random Variables with Finite Variance 217 


Remark 8.7.1 Since relation (8.7.1) is valid for any fixed A, it will also be valid 
when A = A, — 0 slowly enough as n —> oo. If A = A, grows then the asymp- 
totics of P(S, € A[x)) can be obtained by summing the right-hand sides of (8.7.1) 
for, say, A = 1 (if A, — oo is integer-valued). Thus the integral theorem follows 
from the integro-local one but not vice versa. 


Remark 8.7.2 By virtue of the properties of densities (see Sect. 3.2), the right-hand 
side of representation (8.7.1) has the same form as if the random variable 6, = 
Sn/(o/n) had the density #(v) + 0(1), although the existence of the density of S,, 
(or ¢,) is not assumed in the theorem. 


Proof of Theorem 8.7.1 First prove the theorem under the simplifying assumption 
that condition 


lim sup|g(1)| < 1 (8.7.2) 
|tl oo 


is satisfied (the Cramér condition on the ch.f.). Property 11 of ch.f.s (see Sect. 8.7.1) 
implies that this condition is always met if the distribution of the sum S,,, for some 
m > 1, has a positive absolutely continuous component. The proof of Theorem 8.7.1 
in its general form is more complicated and will be given at the end of this section, 
in Sect. 8.7.3. 

In order to use the inversion formula (7.2.8), we employ the “smoothing method” 
and consider, along with S,, the sums 


Zn = Sn +75; (8.7.3) 


where 73 € U_3,0. Since the ch.f. @,, (t) of the random variable 3, being equal to 


(8.7.4) 


possesses the property that the function @,,(t)/t is integrable at infinity, for the 
increments of the distribution function G,(x) of the random variable Z,, (its ch.f. 
divided by ¢ is integrable, too) we can use formula (7.2.8): 


1 fe eta 
Gy(x + A) — Ga(x) = P(Z, € A[x)) = = fev p" (t)@ns (t) dt 


A —itx on ~ 
= = fe yo" (tt) dt, (8.7.5) 


a 
where ((t) = Ong (tL) Pn, (t) (cf. (7.2.8)) is the ch.f. of the sum of independent random 
variables ns and ny. We obtain that the difference G,,(x + A) — G,(x), up to the 
factor A, is nothing else but the value of the density of the random variable S,, + 
ns + na at the point x. 

Split the integral on the right-hand side of (8.7.5) into the two subintegrals: one 
over the domain |f| < y for some y < 1, and the other—over the complementary 


218 8 Sequences of Independent Random Variables. Limit Theorems 


domain. Put x = v./n and consider first 


. 1 : u Uu 
vf = eit "Oo" (t)@(t) dt = — em g"(—)a(—) du. 
\t|<y Jn jul<yJ/n Jn Jn 


Without loss of generality we can assume o = 1, and by (8.2.1) obtain that 


a 5 
1-g(t)= 5 Felt e 


2 
Ing(t) = In[1 _ (1 — v(t)) | = 4 + o(t”) ast— 0. (8.7.6) 
Hence 
1 (5 eae a (8.7.7) 
ning )=-5+ nu), wf. 


where h,(u) — O for any fixed u as n — oo. Moreover, for y small enough, in the 
domain |u| < y./n we have 


|an(u)| < 


so the right-hand side of (8.7.7) does not exceed —u?/3. Now we can rewrite J; in 
the form 


1 u2 u 
=— exp} —iuv — 5 +hnwo}a(—) du, (8.7.8) 
Jn lul<y Jn Pf 2 Jn 


where |@(u/./n)| < 1 and @(u/./n) — 1 for any fixed u as n — oo. Therefore, by 
virtue of the dominated convergence theorem, 


2 
= [es0| ine = +} du (8.7.9) 


uniformly in v, since the integral on the right-hand side of (8.7.8) is uniformly con- 
tinuous in v. But the integral on the right-hand side of (8.7.9) is simply (up to the 
factor 1/(27r)) the result of applying the inversion formula to the ch.f. of the normal 
distribution, so that 


im, Jnl =V2ne-" 2. (8.7.10) 


It remains to consider the integral 


b ad | e VR Go” (1) G(t) dt. 
Itl=y 


8.7 Limit Theorems for Sums of Random Variables with Finite Variance 219 


By virtue of (8.7.2) and non-latticeness of the distribution of &, 
q := sup |g(t)| <1 (8.7.11) 
ItlZy 
and therefore 
|| <q" i |O(t)| dt < q"c(A, 4), lim J/nln=0 ~~ (8.7.12) 
Itl>y n—> Oo 


uniformly in v, where c(A, 5) depends on A and 6 only. We have established that, 
for x = v./n, as n > oo, the relations 


20 2 1 
h+h=,—e?’” — |}, 
1+/2 i e +o(=) 

2 1 
e* 12 + of ) 
J 270Nn Jn 


hold uniformly in v (see (8.7.5)). This means that representation (8.7.13) holds uni- 
formly for all x. 
Further, by (8.7.3), 


(8.7.13) 


> 


P(Z, € A[x)) = 


{Zn €[x,x +A—8)} C {Sn € ALx)} C{Zn ele —8,x+A)} (8.7.14) 


and, so, in particular, 


P(S, € A[x)) < AS ,-(x-37/0n) + (=) = At -F/a) + (=), 
~ J27n Jn 2nn Jn 


By (8.7.14) an analogous converse inequality also holds. Since 6 is arbitrary, this 
is possible only if 


P(S, € A[x)) = tee 7/2) 4 of 1). 8.7.15 
( n € AL )) Jia + ae ( ) 


The theorem is proved. 


8.7.2 Local Theorems 


If the distribution of S, has a density than we can obtain local theorems on the 
asymptotics of this density. 


Theorem 8.7.2 Let EE = 0, E&* = 0” < 00 and suppose there exists anm > 1 
such that at least one of the following three conditions is met: 


(a) the distribution of Sm has a bounded density; 


220 8 Sequences of Independent Random Variables. Limit Theorems 
(b) the distribution of S,, has a density from L; 
(c) the chf. g(t) of the sum S,, is integrable. 


Then, for n > m, the distribution of the sum Sy has density fs, (x) for which the 
representation 


1 a 1 
fss(8) = a ex | | +o(—) (8.7.16) 


holds uniformly in x asin — oo. 
Conditions (a)-(c) are equivalent to each other (possibly with different values 


of m). 


Proof We first establish the equivalence of (a)—(c). The fact that a bounded density 
belongs to L2 was proved in Sect. 7.2.3. Conversely, if f € L2 then 


Fo" @|= | / f(u) f(t —u)du 


1/2 
< | fr(u) du x [ Pe-wau| = [ Pudu <cv. 


Hence the relationship fs,, € L2 implies the boundedness of fs,,,, and thus (a) and 
(b) are equivalent. 

If g” is integrable then by Theorem 7.2.2 the density fs, exists and is bounded. 
Conversely, if fs, is bounded then fs, € Lo, os, € Lo and gs, € Li (see 
Sect. 8.7.2). This proves the equivalence of (a) and (c). 

We will now prove (8.7.16). By the inversion formula (7.2.1), 


1 _ 
fs) = 5 y oni gy dt. 
TE. 


Here the integral on the right-hand side does not “qualitatively” differ from the 
integral on the right-hand side of (8.7.5), we only have to put @(t) = 1 in the part 
I, of the integral (8.7.5) (the integral over the set |t| < y), and, in the part /> (over 
the set |t| > y), to replace the integrable function Q(t) with the integrable function 
gy” (t) and to replace the function g” (t) with g”~ (t). After these changes the whole 
argument in the proof of relation (8.7.13) remains valid, and therefore the same 
relation (up to the factor A) will hold for 


1 x? 1 
Le ae ep ina? | +o( a). 


The theorem is proved. 


Theorem 8.7.2 implies that the density f;, of the random variable ¢, = on 


converges to the density ¢ of the standard normal law: 


fe, (v) > (vr) 


uniformly in v as n —> oo. 


8.7 Limit Theorems for Sums of Random Variables with Finite Variance 221 


For instance, the density of the uniform distribution over [—1, 1] satisfies the 
conditions of this theorem, and hence the density of S, at the point x = va./n 
(o? = 1/3) will behave as Wan e~”/20) (of, the remark to Example 3.6.1). 

In the arithmetic case, where the random variable & is integer-valued and the 
greatest common divisor of all possible values of € equals | (see Sect. 7.1), it is the 
asymptotics of the probabilities P(S, = x) for integer x that become the subject of 
interest for local theorems. In this case we cannot assume without loss of generality 
that E& = 0. 


Theorem 8.7.3 (Gnedenko) Let EE =a, E€* =o? < co and have anarithmetic 
distribution. Then, uniformly over all integers x, as n —> 00, 


P(S,=n=-— exp{ a + of : ) (8.7.17) 
" J 20nT no Jn : - 


Proof When proving limit theorems for arithmetic &, it is more convenient to use 
the generating functions (see Sects. 7.1, 7.7) 


p@=p@:=E?, |z|=1, 


so that pie") = g(t), where ¢ is the ch.f. of é. 
In this case the inversion formulas take the following form (see (7.2.10)): for 
integer x, 


2Q0i 


P(E =x) = | z*—! p(z) dz, 
|z|=1 


1 . 1 x, 
P(S, =x) = =f z*—! p"(z)dz= =| eo" (t) dt. 
2Qni |z|=1 20 = 


As in the proof of Theorem 8.7.1, here we split the integral on the right-hand side 
into two subintegrals: over the domain |f| < y and over the complementary set. The 
treatment of the first subintegral 


i =| egnnat = | ee" o(t)]" dt 
Itl<y lt|<y 


for y = x — an differs from the considerations for J; in Theorem 8.7.1 only in that 
it is simpler and yields (see (8.7.10)) 


i=l ao) 


Similarly, the treatment of the second subintegral differs from that of [2 in Theo- 
rem 8.7.1 in that it becomes simpler, since the range of integration here is compact 


222 8 Sequences of Independent Random Variables. Limit Theorems 


and on that one has 


lo@)|<q(y) <1. (8.7.18) 


Therefore, as in Theorem 8.7.1, 


_ 1 aes 1 y 1 
wni(sp) manne ahseel sa vo) 


The theorem is proved. 


Evidently, for the values of y of order ./n Theorem 8.7.3 is a generalisation of 
the local limit theorem for the Bernoulli scheme (see Corollary 5.2.1). 


8.7.3 The Proof of Theorem 8.7.1 in the General Case 


To prove Theorem 8.7.1 in the general case we will use the same approach as in 
Sect. 7.1. We will again employ the smoothing method, but now, when specifying 
the random variable Z,, in (8.7.3), we will take 07 instead of ns, where 6 = const, 
n is arandom variable with the ch.f. from Example 7.2.1 (see the end of Sect. 7.2) 
equal to 


1—|t|, |tl<1; 


LS 
Qn (t) fe it] >1, 


so that for Z, = S, + On, similarly to (8.7.5), we have 


A 5 
P(Zn € Alx)) = — / e™ 6" (ty, (toon (t) dt, (8.7.19) 
t 


1 
<q 


where g(t) = max(0, 1 — 6|t|). As in Sect. 8.7.1, split the integral on the right- 
hand side of (8.7.19) into two subintegrals: J; over the domain |t| < y and J over 
the domain y < |t| < 1/@. The asymptotic behaviour of these integrals is investi- 
gated in almost the same way as in Sect. 8.7.1, but is somewhat simpler, since the 
domain of integration in /2 is compact, and so, by the non-latticeness of €, one has 
on it the upper bound 


q:= sup |g(t)| <1. (8.7.20) 
ysit|s1/e 


Therefore, to bound J) we no longer need condition (8.7.2). 
Thus we have established, as above, relation (8.7.13). 
To derive from this fact the required relation (8.7.15) we will need the following. 


Lemma 8.7.1 Let f(y) be a bounded uniformly continuous function, n an arbitrary 
proper random variable independent of S, and b(n) > o as n — ov. If, for any 


8.7 Limit Theorems for Sums of Random Variables with Finite Variance 223 


fixed A >0O and 6 > 0,asn— ©, we have 


A x 
P(S, + 67 € A[x)) = 5) (5) + oc). (8.7.21) 
then 
A x 
P(S, € A[x)) = l(a) +o} (8.7.22) 


In this assertion we can take S,, to be any sequence of random variables satisfying 
(8.7.21). In this section we will set b(n) to be equal to ./n, but later (see the proof 
of Theorem A7.2.1 in Appendix 7) we will need some other sequences as well. 


Proof Put 0 := 562A, where 6 > 0 will be chosen later, A+:=(1+26)A, As[x) := 
[x,x + Ax) and fo := max f(y). We first obtain an upper bound for P(S, € A[x)). 
We have 


P(Z, € Ay[x — Ad)) > P(Zy € Ay[x — AS); |n| < 1/8). 
On the event |7| < 1/6 one has —dA < On < 6A, and hence on this event 
{Zn € Az[x — Ad8)} Dd {S, € A[x)}. 
Thus, by independence of 7 and Sy, 
P(Z, € Ay[x — AS)) > P(Sn € A[x); |n| < 1/5) | P(S; € A[x))(1 - h(8)), 


where h(6) := P(|n| => 1/5) — 0 as 6 > O. By condition (8.7.21) and the uniform 
integrability of f we obtain 


P(S, € Alx)) < P(Zn € Ayla — A8))(1— 6)! 


A x 25A fo 1 7 
eae b(n) +o) a h(s)). 


(8.7.23) 


If, for a given ¢ > 0, we choose 6 > 0 such that 
= | EA é 
1—h(6 <l+—, 26 fo < =, 
(1—hG)) < 1+ A es 


then we derive from (8.7.23) that, for all n large enough and ¢ small enough, 


A x 
P(S,; € A[x)) < re (5) + :). (8.7.24) 


224 8 Sequences of Independent Random Variables. Limit Theorems 


This implies, in particular, that for all x, 


P(S, € A[x)) < re say fot 8. (8.7.25) 


Now we will obtain a lower bound for P(S, € A[x)). For the event 
A:={Z, € A_[x + Aé)} 
we have 
P(A) = P(A; In| < 1/5) + P(A; In| = 1/5). (8.7.26) 
On the event |7| < 1/6 we have 
{Zn € A_[x + Ad)} C {S, € A[x)}, 
and hence 
P(A; In| < 1/5) < P(S; € A[x)). (8.7.27) 
Further, by independence of 7 and S;, and inequality (8.7.25), 
P(A; |n| = 1/5) = E[P(A |»); In| = 1/8] 
= E[P(S, € A_[x + On + Ad) | n); In| = 1/6] 
A 
= bi) (fo + €)h(6). 


Therefore, combining (8.7.26), (8.7.27) and (8.7.21), we get 


A x 258A fo 1 A 
P(Ss € Al) 2 Foyt (sin) b(n) +05) bey ee 


In addition, choosing 6 such that 
€ € 
h(d) < =, 26 fo < =, 
foh@) <5 p= 
we obtain that, for all n large enough and ¢€ small enough, 


A i 
P(S; E A[x)) = = a) (( =) °). (8.7.28) 


Since é is arbitrarily small, inequalities (8.7.24) and (8.7.28) prove the required 
relation (8.7.22). The lemma is proved. 


To prove the theorem it remains to apply Lemma 8.7.1 in the case (see (8.7.13)) 
where f = ¢ and b(n) = ./n. Theorem 8.7.1 is proved. 


8.7 Limit Theorems for Sums of Random Variables with Finite Variance 225 


8.7.4 Uniform Versions of Theorems 8.7.1-8.7.3 for Random 
Variables Depending on a Parameter 


In the next chapter, we will need uniform versions of Theorems 8.7.1—8.7.3, where 
the summands &, depend on a parameter 1. Denote such summands by &,),, the 
corresponding distributions by F(,), and put 


n 
Sajn i= bs Ek: 


k=1 


where &(,)x are independent copies of &(,) @ Fi). If A is only determined by the 
number of summands n then we will be dealing with the triangular array scheme 
considered in Sects. 8.3—8.6 (the summands there were denoted by &,,). In the 
general case we will take the segment [0, A;] for some A; > 0 as the parametric set, 
keeping in mind that A € [0, 41] may depend on n (in the triangular array scheme 
one can put A = 1/n). 

We will be interested in what conditions must be imposed on a family of dis- 
tributions Fi) for the assertions of Theorems 8.7.1—8.7.3 to hold uniformly in 
A € [0, A]. We introduce the following notation: 


a(aA)=E&q), oo *(A) = Vary), Gan (t) = Ee, 
The next assertion is an analogue of Theorem 8.7.1. 


Theorem 8.7.1A Let the distributions FQ) satisfy the following properties: 0 < 
01 <a(A) < 02 < ©, where o| and 02 do not depend on xX: 


(a) the relation 


t?m(A) _ 


gay(t) — 1—ia(a)t + o(t?), m2(A):=E&G), (8.7.29) 


holds uniformly in X € [0,21] as t + 0, i.e. there exist a tg > 0 and a function 
e(t) > 0 as t > 0, independent of i, such that, for all |t| < to, the absolute 
value of the left-hand side of (8.7.29) does not exceed e(t)t?; 
(b) for any fixed 0 <0, <0. <@, 
qay:= sup |ga)| <q <1, (8.7.30) 
0) <|t|<62 
where q does not depend on i. 


Then, for each fixed A > 0, 


A 1 
P(Soyn — naa) € At) = —“—6( 5) +o(=). (8.7.31) 


where the remainder term o(1/./n) is uniform in x and i € [0, Ay]. 


226 8 Sequences of Independent Random Variables. Limit Theorems 


Proof Going through the proof of Theorem 8.7.1 in its general form (see Sect. 7.3), 
we see that, to ensure the validity of all the proofs of the intermediate assertions in 
their uniform forms, it suffices to have uniformity in the following two places: 

(a) the uniformity in A of the estimate o(t?) as t — 0 in relation (8.7.6) for the 
expansion of the ch.f. of the random variable € = sae: 
(b) the uniformity in relation (8.7.20) for the same ch.f. 
We verify the uniformity in (8.7.6). For g(t) = E e''® we have by (8.7.29) 


Ing(t) = “ + ing») 


17 (ma(A) — a2(A)) 
= o(?)=- 5 +0(?), 


where the remainder term is uniform in 2. 
The uniformity in relation (8.7.20) clearly follows from condition b), since 0 (A) 
is uniformly separated from both 0 and oo. The theorem is proved. 


Remark 8.7.3 Conditions (a) and (b) of Theorem 8.7.1A are essential for (8.7.31) 
to hold. To see this, consider random variables € and 7 with fixed distributions, 
Eé = En =0 and Eé” = En’ = 1. Let A € [0, 1] and the random variable &) be 
defined by 


é with probability 1 — A 
Ba) i= ) on (8.7.32) 


with probability i, 


so that E&(,) = 0 and Var(&(,)) = 2 — A (in the case of the triangular array scheme 
one can put A = 1/n). Then, under the obvious notational conventions, for A = t?, 
t — 0, we have 


t 3¢° oe 
poay(t) = (1 — A)ge (t) +10,( =) = + o(t*) + tg (1). 
This implies that (8.7.29) does not hold and hence condition a) is not met for the 
values of 4 in the vicinity of zero. At the same time, the uniform versions of relation 
(8.7.31) and the central limit theorem will fail to hold. Indeed, putting A = 1/n, we 
obtain the triangular array scheme, in which the number v, of the summands of the 
form ni/Vd in the sum Sajn = ae &(a)i converges in distribution to v © II; and 


k 


, Where Hy; = > Ni- 
i=l 


1 d Sn—vp AL, 
San = + 
J/n(2 — 2) J2n-1 S2—1/n 


The first term on the right-hand side weakly converges in distribution to ¢ € ®o,1/2, 
while the second term converges to H,,/ J2. Clearly, the sum of these independent 
summands is, generally speaking, not distributed normally with parameters (0, 1). 


8.8 Convergence to Other Limiting Laws 227 


To see that condition (b) is also essential, consider an arithmetic random variable 
€ with Eé = 0 and Var(é) = 1, take 7 to be a random variable with the uniform 
distribution U_,,;, and put 


Bae € with probability 1 — 1, 

“" ) 4 with probability A. 
Here the random variable &(,) is non-lattice (its distribution has an absolutely con- 
tinuous component), but 


ga) (21) = — A) + Ag, (27), qa) 21-22. 


Again putting A = 1/n, we get the triangular array scheme for which condition (b) 
is not met. Relation (8.7.31) does not hold either, since, in the previous notation, the 
sum S(a)n is integer-valued with probability P(v, = 0) = e—!, so that its distribution 
will have atoms at integer points with probabilities comparable, by Theorem 8.7.3, 
with the right-hand side of (8.7.31). This clearly contradicts (8.7.31). 

If we put A = 1/n* then the sum Scjn will be integer-valued with probability 
d-—1/ n’)" —> 1, and the failure of relation (8.7.31) becomes even more evident. 

Uniform versions of the local Theorems 8.7.2 and 8.7.3 are established in a com- 
pletely analogous way. 


Theorem 8.7.2A Let the distributions F(,) satisfy the conditions of Theorem 8.7.1A 
with 62 = 00 and the conditions of Theorem 8.7.2, in which conditions (a)-(c) are 
understood in the uniform sense (i.e., Max, F Seam (x) or the norm of fS0)m in L2 or 
f Ip) @)| dt are bounded uniformly in X € [0, A1]). 

Then representation (8.7.16) holds for fs,,(x) uniformly in x and i, provided 
that on its right-hand side we replace o by a (A). 


Proof The conditions of Theorem 8.7.2A are such that they enable one to obtain 
the proof of the uniform version without any noticeable changes in the arguments 
proving Theorems 8.7.1A and 8.7.2. 


The following assertion is established in the same way. 


Theorem 8.7.3A Let the arithmetic distributions F(,) satisfy the conditions of The- 
orem 8.7.1A for 02 = 1. Then representation (8.7.17) holds uniformly in x and i, 
provided that a and o on its right-hand side are replaced with a(A) and a (A), re- 
spectively. 


Remark 8.7.3 applies to Theorems 8.7.2A and 8.7.3A as well. 


8.8 Convergence to Other Limiting Laws 


AS we Saw in previous sections, the normal law occupies a special place among all 
distributions—it is the limiting law for normed sums of arbitrary distributed random 


228 8 Sequences of Independent Random Variables. Limit Theorems 


variables. There arises the natural question of whether there exist any other limiting 
laws for sums of independent random variables. 

It is clear from the proof of Theorem 8.2.1 for identically distributed random 
variables that the character of the limiting law is determined by the behaviour of the 
ch.f. of the summands in the vicinity of 0. If EE = 0 and Eé* = 0? = —¢"(0) exist, 


then 
1 gy" (0)t? 1 
se —}, 
” ( —) = 2n a n 


and this determines the asymptotic behaviour of the ch.f. of S,,/./n, equal to 
gy" (t./n), which leads to the normal limiting law. Therefore, if one is looking for 
different limiting laws for the sums S, = &| + --- + &,, it is necessary to renounce 
the condition that the variance is finite or, which is the same, that yg’ (0) exists. In 
this case, however, we will have to impose some conditions on the regular variation 
of the functions Fy (x) = P(é > x) and/or F_(x) = P(é < —x) as x — oo, which 
we will call the right and the left tail of the distribution of &, respectively. We will 
need the following concepts. 


Definition 8.8.1 A positive (Lebesgue) measurable function L(t) is called a slowly 
varying function (s.v.f.) as t > oo, if, for any fixed v > 0, 

L(vt) 

L(t) 


>1 ast> oH. (8.8.1) 


A function V(t) is called a regularly varying function (t.v.f.) (of index —B) as t > 
oo if it can be represented as 


Vit) =r FLQ), (8.8.2) 
where L(t) is an s.v.f. as t > oo. 


One can easily see that, similarly to (8.8.1), the characteristic property of regu- 
larly varying functions is the convergence 


V(vt) _B 
>v 
V(t) 


as t > co (8.8.3) 


for any fixed v > 0. Thus an s.v.f. is an r.v.f. of index zero. 

Among typical representatives the class of s.v.f.s are the logarithmic function and 
its powers In” t, y € R, linear combinations thereof, multiple logarithms, functions 
with the property that L(t) — L =const 4 0 as t > o etc. As an example of a 
bounded oscillating s.v.f. we mention 


Lo(t)=2+sindnInt), ¢>1. 


The main properties of r.v.f.s are given in Appendix 6. 


8.8 Convergence to Other Limiting Laws 229 


As has already been noted, for S, /b(1) to converge to a “nondegenerate” limiting 
law under a suitable normalisation b(n), we will have to impose conditions on the 
regular variation of the distribution tails of &. More precisely, we will need a regular 
variation of the “two-sided tail” 


Fo(t) = F_(t) + Fi(t) = P(E ¢ [-1,)). 


We will assume that the following condition is satisfied for some 6 € (0, 2], 
p€[-l, 1): 

[Rg,o] The two-sided tail Fo(x) = F_(x) + F,(x) is an r.v.f. as x > 00, Le. it 
can be represented as 


Fo(x) = fle (x), Be, 2], (8.8.4) 


where Lf, (x) is an s.v.f., and the following limit exists 


F. 
p+:= lim a € 


= 0, 1], :=2p4—1. 8.8.5 
ye eG [0, 1] p P+ ( ) 


If p+ > 0, then clearly the right tail Fy (x) is an rv.f. like Fo(x), i.e. it can be 
represented as 


Fy) =V(x):=x PL), BE 0,2], L&)~ La). 


(Here, and likewise in Appendix 6, we use the symbol V to denote an r.v-f.) If 
p+ =0, then the right tail Fy (x) = o(Fo(x)) is not assumed to be regularly varying. 
Relation (8.8.5) implies that the following limit also exists 


_ F_(x) 
:= lim — 
x00 Fo(x) 


p- — Pr: 


If p_ > 0, then, similarly to the case of the right tail, the left tail F_(x) can be 
represented as 


F_(x)= W(x) :=x PLy(x), BE (0,2), Lw(x) ~ p_Ly(x). 


If o_ = 0, then the left tail F_(x) = o(Fo(x)) is not assumed to be regularly varying. 
The parameters p+ are related to the parameter p in the notation [Rg,,] through 
the equalities 


p= p+— p-=294—1e[-1, 1]. 
Clearly, in the case 6 < 2 we have Eé* = 00, so that the representation 
20 
t“o 
t) =1— —— + 0(t? t>0 
p(t) 5) + o( ) ast > 


no longer holds, and the central limit theorem is not applicable. If E& exists and is 
finite then everywhere in what follows it will be assumed without loss of generality 


230 8 Sequences of Independent Random Variables. Limit Theorems 
that 
Eé = 0. 


Since Fo(x) is non-increasing, there always exists the “generalised” inverse function 
re (u) understood as 


FX) (uw) = inf{x : Fo(x) <u}. 


If the function Fo is strictly monotone and continuous then b = ye (wu) is the 
unique solution to the equation 


Fo(b)=u, ue (0,1). 


Set 
Sn 
on — b(n) ’ 
wherein the case 6 > 2 we define the normalising factor b(n) by 
b(n) := FS-) (1/n). (8.8.6) 
For 6 = 2 put 
b(n) := YT (1/n), (8.8.7) 
where 


¥(x) = 2x? , yFaty)dy = 20] ii " yFy(y)dy + [ yF-()dy| 
= x °E(é?; —x<éE< x) =x *Ly(x), (8.8.8) 


Ly is an s.v.f. (see Theorem A6.2.1(iv) in Appendix 6). It follows from Theo- 
rem A6.2.1(v) in Appendix 6 that, under condition (8.8.4), we have 


b(n)=n'/PL(n), 6 <2, 


where Ly is an s.v.f. 
We introduce the functions 


Oe [ Vdy, W@)= / V(y) dy. 


8.8.1 The Integral Theorem 


Theorem 8.8.1 Let condition [Rg,p| be satisfied. Then the following assertions hold 
true. 


8.8 Convergence to Other Limiting Laws 231 
(i) For B € (0,2), B £1 and the normalising factor (8.8.6), asn > o, 
bin => FP), (8.8.9) 


The distribution Fg ,of the random variable ¢ (8-P) depends on parameters B 
and p only and has a chf. op?) (t), given by 


FP) = Bel” = expl|t/ BB, p,9)}, 10) 


where 3 = signt, 


. _ pr Bu 
B(B,p,v)=T - p)| ioe pg aes 4 (8.8.11) 
and, for B € (1,2), we put "U0 — 8B) =I'(2— B)/UA — B). 
Gi) When B = 1, for the sequence ¢, with the normalising factor (8.8.6) to con- 
verge to a limiting law, the former, generally speaking, needs to be centred. 
More precisely, as n — ov, the following convergence takes place: 


Rady. (8.8.12) 
where 
n 
= bol”! (b(n)) — Wr(b(n)) | — eC, (8.8.13) 
C 0.5772 is the Euler constant, and 
. . t 
gh) (t) = Belt" = exp| - —iptin if (8.8.14) 


Tf n[ V7 (b(n)) — W7 (b(n))] = o(b(n)), then p = 0 and we can put Ay = 0. 
If E& exists and equals zero then 


n 


= FLW’ (o@)) —V'(bq))] — eC. 


n 


IfEE =0 and p £0 then pA; > —wasn— ~. 
(iii) For B = 2 and the normalising factor (8.8.7), asn > o, 


vee Qs 2 
— ce) gp) (t):= Eeité' p) =e! [2 
so that ¢ °°) has the standard normal distribution that is independent of p. 


The Proof of Theorem 8.8.1 is based on the same considerations as the proof of 
Theorem 8.2.1, i.e. on using the asymptotic behaviour of the ch-f. g(t) in the vicinity 
of zero. But here it will be somewhat more difficult from the technical viewpoint. 
This is why the proof of Theorem 8.8.1 appears in Appendix 7. 


232 8 Sequences of Independent Random Variables. Limit Theorems 


Remark 8.8.1 The last assertion of the theorem (for 6 = 2) shows that the limiting 
distribution may be normal even in the case of infinite variance of &. 

Besides with the normal distribution, we also note “extreme” limit distributions, 
corresponding to the o = +1 where the ch.f. y“*:”) (or the respective Laplace trans- 


form) takes a very simple form. Let, for example, p = —1. Since e'7”/* = #i, then, 
for B £1, 2, 
v 0 
BB, -1,0) =-ra-— pli sin a 4eines | 


= —r(1— pyel?"/? —_ra— piv)’, 
p(t) = exp{—rd — B)(it)*}, 
Bes? = exp{—-r(1— p)a*}, Rea>0. 


Similarly, for 6 = 1, by (8.8.14) and the equalities _™ =I inp = i Init we have 


(1,-1) mvt : : ; ; : : 
Ing’” eae ea a de 
Ee"? —expfalna}, Rer>0. 
A similar formula is valid for p = 1. 


Remark 8.8.2 If B < 2, then by virtue of the properties of s.v.f.s (see Theo- 
rem A6.2.1(iv) in Appendix 6), as x > oo, 


a a 1 1 
; yFo(y) dy = / y' PLa(y)dy ~ x? La (x)= x Fo(x). 
0 0 2— 8 —f 


2 


Therefore, for 6 < 2, we have Y(x) ~ 2(2 — B)~! Fo(x), 


= 1/B 
roram~ Fe (ZSB)~ (saa) Palm 


(cf. (8.8.6)). On the other hand, for 6 = 2 and o7:= Ez? < oo one has 


Y(x) ~ x70”, b(n) = YY (1/n) ~ Jon. 


Thus normalisation (8.8.7) is “transitional” from normalisation (8.8.6) (up to the 
constant factor (2/(2 — B))\/ 8) to the standard normalisation o./n in the cen- 
tral limit theorem in the case where E&* < 00. This also means that normalisa- 
tion (8.8.7) is “universal” and can be used for all 6 < 2 (as it is done in many 
textbooks on probability theory). However, as we will see below, in the case B < 2 
normalisation (8.8.6) is easier and simpler to deal with, and therefore we will use 
that scaling. 


8.8 Convergence to Other Limiting Laws 233 


Recall that Fg , denotes the distribution of the random variable ¢8-P) The pa- 
rameter f takes values in the interval (0, 2], the parameter p = p, — p_ can assume 
any value from [—1, 1]. The role of the parameters £8 and p will be clarified below. 

Theorem 8.8.1 implies that each of the laws Fg,,,0 < 6 <2 and —l<p< lis 
limiting for the distributions of suitably normalised sums of independent identically 
distributed random variables. It follows from the law of large numbers that the de- 
generate distribution I, concentrated at the point a is also a limiting one. Denote the 
set of all such distributions by Go. Furthermore, it is not hard to see that if F is a dis- 
tribution from the class Go then the law that differs from F by scaling and shifting, 
i.e. the distribution F;,,,) defined, for some fixed b > 0 and a, by the relation 


—a 


B-a B 
Fia,5)(B) -*(7=*), where ={ueR:ub+ae B}, 
is also limiting for the distributions of sums of random variables (S$, — ay,)/bn as 
n — oo for appropriate {a,} and {by}. 

It turns out that the class of distributions G obtained by the above extension from 
Go exhausts all the limiting laws for sums of identically distributed independent 
random variables. 

Another characterisation of the class of limiting laws G is also possible. 


Definition 8.8.2 We call a distribution F stable if, for any aj, a2, b, > 0, b2 > 0, 
there exist a and b > 0 such that 


Fea, b1} * Flam} = Fao}. 


This definition means that the convolution of a stable distribution F with itself 
again yields the same distribution F, up to a scaling and shift (or, which is the 
same, for independent random variables &; € F we have (€ + 2 — a)/b € F for 
appropriate a and b). 

In terms of the ch.f. g, the stability property has the following form: for any 
b, > O and bz > 0, there exist a and b > 0 such that 


y(th, )y(th>) =e“ y(tb), teER. (8.8.15) 


Denote the class of all stable laws by G*. The remarkable fact is that the class of all 
limiting laws © (for (Sp — an)/by for some ayn and b,) and the class of all stable 
laws 6 * coincide. 

If, under a suitable normalisation, as n — oo, 


ine , 


then one says that the distribution F of the summands & belongs to the domain of 
attraction of the stable law Fg, p. 

Theorem 8.8.1 means that, if F satisfies condition [Rg_], then F belongs to the 
domain of attraction of the stable law Fg p. 


234 8 Sequences of Independent Random Variables. Limit Theorems 


One can prove the converse assertion (see e.g. Chap. XVII, § 5 in [30]): if F 
belongs to the domain of attraction of a stable law Fg p for B < 2, then [Rg_p] is 
satisfied. 

As for the role of the parameters 6 and p, note the following. The parameter 6 
characterises the rate of convergence to zero as x —> oo for the functions 


F,p,-(x) =F g,p((—00, —x)) and Fg,,4(x) = F4,p([x, 00)). 


One can prove that, for p+ > 0, as t > ow, 

Fe.p.4(t)~ prt’, (8.8.16) 
and, for p_ > 0, as t > oo, 

Fg.p,-(t) ~ p_t-?. (8.8.17) 


Note that, for § € Fg ,, the asymptotic relations in Theorem 8.8.1 turn into pre- 
cise equalities provided that we replace in them b(n) with b, := n'/8. In particular, 


Sn 
(> > ) = Fp,p,4(t). (8.8.18) 


n 


This follows from the fact that [p"*-”) (t /b,)]” coincides with vy?) (t) (see (8.8.10)) 
and hence the distribution of the normalised sum S,/b, coincides with the distribu- 
tion of the random variable €. 

The parameter p taking values in [—1, 1] is the measure of asymmetry of the dis- 
tribution Fg ,. If, for instance, p = 1 (o_ = 0), then, for 6 < 1, the distribution Fg | 
is concentrated entirely on the positive half-line. This is evident from the fact that in 
this case Fg; can be considered as the limiting distribution for the normalised sums 
of independent identically distributed random variables & > 0 (with F_(0) = 0). 
Since all the prelimit distributions are concentrated on the positive half-line, so is 
the limiting distribution. 

Similarly, for p = —1 and B < 1, the distribution Fg —| is entirely concentrated 
on the negative half-line. For p = 0 (9+ = p_ = 1/2) the ch-f. of the distribution 
Fz .9 will be real, and the distribution Fg 0 itself is symmetric. 

As we saw above, the ch.f.s gp Pe) (t) of stable laws Fg , admit closed-form rep- 
resentations. They are clearly integrable over R, and the same is true for the func- 
tions tk pB -°)(t) for any k > 1. Therefore all the stable distributions have densities 
that are differentiable arbitrarily many times (see e.g. the inversion formula (7.2.1)). 
As for explicit forms of these densities, they are only known for a few laws. Among 
them are: 


8.8 Convergence to Other Limiting Laws 235 


1. The normal law F2_, (which does not depend on p). 

2. The Cauchy distribution F,9 with density 2/(7* + 4x7), —00 < x < oo. Scal- 
ing the x-axis with a factor of 2/2 transforms this density into the form 1/(1 +x?) 
corresponding to Ko 1. 

3. The Lévy distribution. This law can be obtained from the explicit form for 
the distribution of the maximum of the Wiener process. This will be the distribution 
F{/2,1 with parameters 1/2, 1 and density (up to scaling; cf. (8.8.16)) 


1 
af 270x912 


(this density has a first hitting time of level 1 by the standard Wiener process, see 
Theorem 19.2.2). 


eo WQx), 


FOP YG) = x>0 


8.8.2 The Integro-Local and Local Theorems 


Under the conditions of this section we can also obtain integro-local and local the- 
orems in the same way as in Sect. 8.7 in the case of convergence to the normal law. 
As in Sect. 8.7, integro-local theorems deal here with the asymptotics of 


P(S;, € Alx)), A[x) =[x,x + A) 


as n — oo fora fixed A > 0. 

As we can see from Theorem 8.8.1, the ch-f. gp PP) (t) of the stable law Fg , is 
integrable, and hence, by the inversion formula, there exists a uniformly continuous 
density f‘%-?) of the distribution Fz. (As has already been noted, it is not difficult 
to show that f*-) is differentiable arbitrarily many times, see Sect. 7.2.) 


Theorem 8.8.2 (The Stone integro-local theorem) Let € be a non-lattice random 
variable and the conditions of Theorem 8.8.1 be met. Then, for any fixed A > 0, as 
n— OOo, 


ee an( (<5) 
P(S: € ALN) = pF? | +0( oJ; (8.8.19) 


where the remainder term o( Kay) is uniform over x. 
If B = 1 and E|é| does not exist then, on the right-hand side of (8.8.20), we must 
replace 7) with fe Ge — An), where Ay is defined in (8.8.13). 


All the remarks to the integro-local Theorem 8.7.1 hold true here as well, with 
evident changes. 

If the distribution of S, has a density then we can find the asymptotics of that 
density. 


236 8 Sequences of Independent Random Variables. Limit Theorems 


Theorem 8.8.3 Let there exist an m > | such that at least one of conditions (a)—(c) 
of Theorem 8.7.2 is satisfied. Moreover, let the conditions of Theorem 8.8.1 be met. 
Then for the density fs, (x) of the distribution of Sy, one has the representation 


1 x 1 
= (B,p) 
fs, (x) = bo? e (55) + o() (8.8.20) 


which holds uniformly in x asin — oo. 
If B =1 and E|é| does not exist then, on the right-hand side of (8.8.20), we must 
replace gr) with i Ge — An), where Ay is defined in (8.8.13). 


The assertion of Theorem 8.8.3 can be rewritten for 6, = 5 — An as 


fev) > fF (v) 


for any v as n — 00. 
For integer-valued &; the following theorem holds true. 


Theorem 8.8.4 Let the distribution of & be arithmetic and the conditions of Theo- 
rem 8.8.1 be met. Then, uniformly for all integers x, asin —> o, 


1 x—an 1 
ay (B,p) 
P(S, =x) = bm F ( ban) )+o(—). (8.8.21) 


where a= EE if E|E| exists and a=0 if E|&| does not exist, B £1. If B = 1 
and E|é| does not exist then, on the right-hand side of (8.8.21), we must replace 


frre) Wik f PPG An), 


The proofs of Theorems 8.8.2—8.8.4 mostly repeat those of Theorems 8.7.1—8.7.3 
and can be found in Appendix 7. 


8.8.3 An Example 


In conclusion we will consider an example. 

In Sect. 12.8 we will see that in the fair game considered in Example 4.2.3 the 
ruin time n(z) of a gambler with an initial capital of z units satisfies the relation 
P(n(z) =n) ~ 2/2/70 as n —> ov. In particular, for z = 1, 


P(n(1) =n) ~ /2/a0. (8.8.22) 


It is not hard to see (for more detail, see also Chap. 12) that n(z) has the same 
distribution as | +72+----+7;, where n; are independent and distributed as 7(1). 


8.8 Convergence to Other Limiting Laws 237 


Thus for studying the distribution of 7(z) when z is large, by virtue of (8.8.22), one 
can make use of Theorem 8.8.4 (with 6 = 1/2, b(n) = 2n? /7), by which 


(= < *) = Fi2,1(x) (8.8.23) 


lim P 


£00 


is the Lévy stable law with parameters 6 = 1/2 and p = 1. Moreover, for integer x 


and z > oo, 
cA XIU 1 
ao) oS U2 ee 2 
P(n@) =x) = af (33) +2). 


These assertions enable one to obtain the limiting distribution for the number of 
crossings of an arbitrary strip [u, v] by the trajectory S,,..., S, in the case where 


PE =—1l = PE =—-) = 1/2. 


Indeed, let for simplicity u = 0. By the first positive crossing of the strip [0, v] we 
will mean the Markov time 


n+ t= min{k: S, =v}. 
The first negative crossing of the strip is then defined as the time n+ + y_, where 
n— i= min{k : Sy, +" =O}. 


The time n; = 7+ + n~ will also be the time of the “double crossing” of [0, v]. The 
variables n+ are distributed as 7(v) and are independent, so that 7; has the same 
distribution as 7(2v). The variable Hy = n1(2v) +--- + 7x(2v), where nj (2v) have 
the same distribution as n(2v) and are independent, is the time of the k-th double 
crossing. Therefore 


v(n) := max{k : Hy <n} =min{k: Hy >n}—-1 


is the number of double crossings of the strip [0, v] by time n. Now we can prove 
the following assertion: 


ine OSs \ eK = 8.8.24 
fim a =) = Peal aaa) (8.8.24) 


To prove it, we will make use of the following relation (which will play, in its 
more general form, an important role in Chap. 10): 


{v(n) =k} = {Ak <n}, 


where Hy is distributed as n(2vk). If n/k? > s* as n — oo, then by virtue of 


(8.8.23) 
P(H, <n) =P 2a Ay 2 20n F ms? 
= > = 
— (Quk)2 ~ (vk) MBl\ ye J? 


238 8 Sequences of Independent Random Variables. Limit Theorems 


and therefore 


v(n) cs 
p( 2 > x) =P(vin) 2 x/n) =P yay S) > Fir (sax): 


(Here for k = |x./n] one has n/k? —> s* = 1/x?.) Relation (8.8.24) is proved. 


Assertion (8.8.24) will clearly remain true for the number of crossings of the 
strip [u, v], u #0; one just has to replace v with v — u on the right-hand side of 
(8.8.24). It is also clear that (8.8.24) enables one to find the limiting distribution of 
the number of “simple” (not double) crossings of [u, v] since the latter is equal to 
2v(n) or 2v(n)+1. 


Chapter 9 
Large Deviation Probabilities for Sums 
of Independent Random Variables 


Abstract The material presented in this chapter is unique to the present text. After 
an introductory discussion of the concept and importance of large deviation prob- 
abilities, Cramér’s condition is introduced and the main properties of the Cramér 
and Laplace transforms are discussed in Sect. 9.1. A separate subsection is devoted 
to an in-depth analysis of the key properties of the large deviation rate function, 
followed by Sect. 9.2 establishing the fundamental relationship between large devi- 
ation probabilities for sums of random variables and those for sums of their Cramér 
transforms, and discussing the probabilistic meaning of the rate function. Then the 
logarithmic Large Deviations Principle is established. Section 9.3 presents integro- 
local, integral and local theorems on the exact asymptotic behaviour of the large 
deviation probabilities in the so-called Cramér range of deviations. Section 9.4 is de- 
voted to analysing various types of the asymptotic behaviours of the large deviation 
probabilities for deviations at the boundary of the Cramér range that emerge under 
different assumptions on the distributions of the random summands. In Sect. 9.5, 
the behaviour of the large deviation probabilities is found in the case of heavy-tailed 
distributions, namely, when the distributions tails are regularly varying at infinity. 
These results are used in Sect. 9.6 to find the asymptotics of the large deviation 
probabilities beyond the Cramér range of deviations, under special assumptions on 
the distribution tails of the summands. 


Let &, &|, &2,... be a sequence of independent identically distributed random vari- 
ables, 


n 
E&=0, E&=0%<00, Sp=>_k. 
k=1 


Suppose that we have to evaluate the probability P(S,, > x). If x ~ v./n as n > oo, 
v = const, then by the integral limit theorem 


POS, 2) ~1-0(2) (9.0.1) 


oO 


as n — oo. But if x >> ./n, then the integral limit theorem enables one only to 
conclude that P(S, > x) > 0 as n > o, which in fact contains no quantitative 


A.A. Borovkov, Probability Theory, Universitext, 239 
DOT 10.1007/978-1-4471-5201-9_9, © Springer-Verlag London 2013 


240 9 Large Deviation Probabilities for Sums of Independent Random Variables 


information on the probability we are after. Essentially the same can happen for 
fixed but “relatively” large values of v/o. For example, for v/o > 3 and the values 
of n around 100, the relative accuracy of the approximation in (9.0.1) becomes, gen- 
erally speaking, bad (the true value of the left-hand side can be several times greater 
or smaller than that of the right-hand side). Studying the asymptotic behaviour of 
P(S, => x) for x >> ./n as n > ©, which is not known to us yet, could fill these 
gaps. This problem is highly relevant since questions of just this kind arise in many 
problems of mathematical statistics, insurance theory, the theory of queueing sys- 
tems, etc. For instance, in mathematical statistics, finding small probabilities of er- 
rors of the first and second kind of statistical tests when the sample size n is large 
leads to such problems (e.g. see [7]). In these problems, we have to find explicit 
functions P(n, x) such that 


P(S, > x) = P(n, x)(1+ o()) (9.0.2) 


as n — oo. Thus, unlike the case of normal approximation (9.0.1), here we are 
looking for approximations P(n, x) with a relatively small error rather than an ab- 
solutely small error. If P(n, x) — 0 in (9.0.2) as n — oo, then we will speak of the 
probabilities of rare events, or of the probabilities of large deviations of sums Sy. 
Deviations of the order ./n are called normal deviations. 

In order to study large deviation probabilities, we will need some notions and 
assertions. 


9.1 Laplace’s and Cramér’s Transforms. The Rate Function 


9.1.1 The Cramér Condition. Laplace’s and Cramér’s Transforms 


In all the sections of this chapter, except for Sect. 9.5, the following Cramér condi- 
tion will play an important role. 


[C] There exists a} £0 such that 
Ee** = / e’)F(dy) < oo. (9.1.1) 


We will say that the right-side (left-side) Cramér condition holds if A > 0 (A < 0) 
in (9.1.1). If (9.1.1) is valid for some negative and positive 4 (i.e. in a neighbour- 
hood of the point A = 0), then we will say that the two-sided Cramér’s condition is 
satisfied. 

The Cramér condition can be interpreted as characterising a fast (at least expo- 
nentially fast) rate of decay of the tails F(t) of the distribution F. If, for instance, 
we have (9.1.1) for A > 0, then by Chebyshev’s inequality, for t > 0, 


Fy (t):= PE >t) <e'Ee*, 


9.1 Laplace’s and Cramér’s Transforms. The Rate Function 241 


i.e. F(t) decreases at least exponentially fast. Conversely, if, for some jz > 0, one 
has F,(t) <ce~“', t > 0, then, for A € (0, 2), 


lore) cd ey 
[ eF(dy) = -f e dFx(y) = Fi(0) + af ce” Fy.(y) dy 
Fe a. cr 
< F,(0) +n | eO-W dy = Fy0) + <0. 
0 es 


Since the integral nen e’’F(dy) is finite for any A > 0, we have Ee*5 < oo for 
A € (0, pW). 

The situation is similar for the left tail F_(t) := P(€é < —t) provided that (9.1.1) 
holds for some 4 < 0. 

Set 


Aqi= sup{a: Ee” < oo}, AWI= inf {A : Ee" < oo}. 


Condition [C] is equivalent to 44 > A_. The right-side Cramér condition means 
that A+ > 0; the two-sided condition means that A+ > 0 > A_. Clearly, the ch-f. 
g(t) = Ee! is analytic in the complex plane in the strip —A. <_Imt < —A_. This 
follows from the differentiability of g(t) in this region of the complex plane, since 
the integral [| ye''’ |F(dy) for the said values of Im? converges uniformly in Ret. 
Here and henceforth by the Laplace transform (Laplace-—Stieltjes or Laplace— 
Lebesgue) of the distribution F of the random variable € we shall mean the function 


w(A) := Ee = g(-iA), 


which conflicts with Sect. 7.1.1 (and the terminology of mathematical analysis), 
according to which the term Laplace’s transform refers to the function Ee~** = 
g(id). The reason for such a slight inconsistency in terminology (only the sign of 
the argument differs, this changes almost nothing) is our reluctance to introduce new 
notation or to complicate the old notation. Nowhere below will it cause confusion. ! 

As well as condition [C], we will also assume that the random variable & is 
nondegenerate, i.e. § # const or, which is the same, Var & > 0. 


The main properties of Laplace’s transform. 


As was already noted in Sect. 7.1.1, Laplace’s transform, like the ch.f., uniquely 
characterises the distribution F. Moreover, it has the following properties, which 
are similar to the corresponding properties of ch.f.s (see Sect. 7.1). Under obvious 
conventions of notation, 


(W1) Wate (A) = 4 We (bd), if a and b are constant. 


'In the literature, the function Ee*® is sometimes called the “moment generating function”. 


242 9 Large Deviation Probabilities for Sums of Independent Random Variables 


(W2) If &,...,&, are independent and S, = viet §;, then 


ws, 0) =] [ v5.00. 


j=l 
(W3) If Elé|* < 00 and the right-side Cramér condition is satisfied then the func- 
tion We is k-times right differentiable at the point = 0, 
k 
v6? (0) = Be" =: me 
and, as x |, 0, 
k 4 
i k 
weQ)=1+ >> i +o(a*). 
j=l 


This also implies that, as 1 | 0, the representation 
Yj Ay k 
=o ee ), (9.1.2) 


holds, where y; are the so-called semi-invariants (or cumulants) of order j of the 
random variable €. One can easily verify that 


N=, Yy=mj=o7, yom, ..., O12) 


where m? =E(é —m ,)* is the central moment of order k. 


Definition 9.1.1 Let condition [C] be met. The Cramér transform at the point X of 
the distribution ¥ is the distribution” 


e*’ F(dy) 
Ey) (dy) = ———. 9.1.4 
cay (dy) TO) (9.1.4) 


In some publications the transform (9.1.4) is also called the Esscher transform. However, the 
systematic use of transform (9.1.4) for the study of large deviations was first done by Cramér. 

If we study the probabilities of large deviations of sums of random variables using the inver- 
sion formula, similarly to what was done for normal deviations in Chap. 8, then we will necessarily 
come to employ the so-called saddle-point method, which consists of moving the contour of inte- 
gration so that it passes through the so-called saddle point, at which the exponent in the integrand 
function, as we move along the imaginary axis, attains its minimum (and, along the real axis, at- 
tains its maximum; this explains the name “saddle point”). Cramér’s transform does essentially 
the same, making such a translation of the contour of integration even before applying the inver- 
sion formula, and reduces the large deviation problem to the normal deviation problem, where the 
inversion formula is not needed if we use the results of Chap. 8. It is this technique that we will 
follow in the present chapter. 


9.1 Laplace’s and Cramér’s Transforms. The Rate Function 243 


Clearly, the distributions F and F(,) are mutually absolutely continuous (see 
Sect. 3.5 of Appendix 3) with density 


Fa) (dy) = ely 
F(dy) WA) 


Denote a random variable with distribution FQ) by &(). 
The Laplace transform of the distribution F(,) is obviously equal to 


Fe4sa = vA+H) 


9.1.5 
vO) — 
Clearly, 
WA) ' w(Aa) 
Eéq) = ey (Inw(A)), Eg{) = Tae 
Ten wer 2 ' 
Var(&(a)) = nn = (<@) = (Inv). 


Since (A) > 0 and Var(E(,)) > 0, the foregoing implies one more important prop- 
erty of the Laplace transform. 


(W4) The functions (A) and \Iny(A) are strictly convex, and 
w'(A) 

B&Q) = —__ 
WA) 


strictly increases on (A_, A+). 


The analyticity of y(A) in the strip Red € (A_,A+) can be supplemented by 
the following “extended” continuity property on the segment [A_, A+] (in the strip 
Red € [A_, A+]). 


(W5) The function (A) is continuous “inside” [A_, r+], ie. WAL FO) = WAL 
(where the cases w(A+) = 00 are not excluded). 


wm 


Outside the segment [A_,A] such continuity, generally speaking, does not 
hold as, for example, is the case when w(A+) < oo and w(A, + 0) = 00, which 
takes place, say, for the distribution F with density f(x) = cx~%e~*+* for x > 1, 
c =const. 


9.1.2 The Large Deviation Rate Function 


Under condition [C], the /arge deviation rate function will play the determining role 
in the description of asymptotics of probabilities P(S, > x). 


244 9 Large Deviation Probabilities for Sums of Independent Random Variables 


Definition 9.1.2 The large deviation rate function (or, for brevity, simply the rate 
function) A of a random variable & is defined by 


A(a):= sup(@a _ Iny(A)). (9.1.6) 
d 


The meaning of the name will become clear later. In classical analysis, the right- 
hand side of (9.1.6) is known as the Legendre transform of the function Inw(A). 

Consider the function A(@,A) = ad — Inw(A) of the supremum appearing 
in (9.1.6). The function — In y(A) is strictly concave (see property (W4)), and hence 
so is the function A(q@, A) (note also that A(a, A) = —InwWe(A), where Wo(A) = 
e~*%y(A) is the Laplace transform of the distribution of the random variable & — a 
and, therefore, from the “qualitative point of view”, A(a, A) possesses all the prop- 
erties of the function —Iny(A)). The foregoing implies that there always exists a 
unique point 4 = A(a@) (on the “extended” real line [—oo, 00]) at which the supre- 
mum in (9.1.6) is attained. As a grows, the value of A(a@, A) for A > O increases 
(proportionally to 4), and for A < 0 it decreases. Therefore, the graph of A(q, A) as 
the function of A will, roughly speaking, “roll over” to the right as a grows. This 
means that the maximum point A(q@) will also move to the right (or stay at the same 
place if A(@) = A+). 

We now turn to more precise formulations. On the interval [A_, A+], there exists 
the derivative (respectively, the right and the left derivative at the endpoints A+) 


"A 
Ai (a, A) =a — one (9.1.7) 
The parameters 
At FO 
—_ — =. a <ay, (9.1.8) 


will play an important role in what follows. The value of a, determines the angle at 
which the curve In w(A) “sticks” into the point (A+, Inw(A_)). The quantity w_ has 
a similar meaning. If a € [w_, a+] then the equation Ai (a, 4)=0, or (see (9.1.7)) 


WA) 
WA) 


always has a unique solution A(@) on the segment [A_, A+] (A+ can be infinite). 


This solution A(q@), being the inverse of an analytical and strictly increasing function 


ae on (A_, A+) (see (9.1.9)), is also analytical and strictly increasing on (a_, a+), 


=a, (9.1.9) 


Aa) FA, asa tas; A(a) |A- asala_. (9.1.10) 
The equalities 


A(a) = oA (a) — Inw(A(a)), — 2 (9.1.11) 


9.1 Laplace’s and Cramér’s Transforms. The Rate Function 245 


yield 
pa i WOM), 
A’ (a) = X(a) + ad’ (@) Wala) (a) = A(@). 
Recalling that 
a) 
AMEE CELA. Aph m Ela. 


we obtain the following representation for the function A: 
(Al) If ao € [a@_, a4], @ € [a_, a4] then 
a 
A(a) = Atao) + f A(v) dv. (9.1.12) 


ao 


Since h(m,) = A(m,) = 0 (this follows from (9.1.9) and (9.1.11)), we obtain, 
in particular, for ag = m4, that 


Aca) = [ A(v) dv. (9.1.13) 


ial 


The functions (a) and A(a) are analytic on (a_, a4). 


Now consider what happens outside the segment [a_, a]. Assume for definite- 
ness that A+ > 0. We will study the behaviour of the functions A(@) and A(q@) near 
the point a, and for a > a+. Similar results hold true in the vicinity of the point a_ 
in the case A_ <0. 

First let 44. = oo, 1.e. the function In w(A) is analytic on the whole semiaxis 
2. > 0, and the tail F,(t) decays as t > oo faster than any exponential function. 
Denote by 


S-= + sup{t : F(t) > o} 


the boundaries of the support of F. Without loss of generality, we will assume that 
s,>0, s_<0O. (9.1.14) 


This can always be achieved by shifting the random variable, similarly to our as- 
suming, without loss of generality, EE = 0 in many theorems of Chap. 8, where we 
used the fact that the problem of studying the distribution of S, is “invariant” with 
respect to a shift. (We can also note that Ag_g(@ — a) = A¢(q@), see property (A4) 
below, and that (9.1.14) always holds provided that Eé = 0.) 


(A2) @ If AL =O thenay = $4. 


Hence, for s, = 00, we always have a, = oo and so for any a > a_ we are 
dealing with the already considered “regular” case, where (9.1.12) and (9.1.13) hold 
true. 


246 9 Large Deviation Probabilities for Sums of Independent Random Variables 
(ii) If sy < 00 thendA, =W, a, =S4, 
A(a+) =—InP(é =s+), A(a)=o fora>ay,. 
Similar assertions hold true for s_,a_, A_. 


Proof (i) First let s; < co. Then the asymptotics of w(A) and w’(A) as A > ©¢ is 
determined by the integrals in a neighbourhood of the point s+: for any fixed e > 0, 


WA~E(#:§>s4—2), — W'Q)~E(Ee*; & > 54 —e) 
as A — oo. Hence 


_ W'(A) _ E(ée#;& > 54 —) 
= lim = lim = 
asoo WA) Asoo E(e48;& > 54 —8) 


A+ Sy. 


If s_ = 00, then Inw(A) grows as 4 — oo faster than any linear function and 
therefore the derivative (In w(A))’ increases unboundedly, a4 = oo. 
(ii) The first two assertions are obvious. Further, let p+ = P(é = s_) > 0. Then 


Wa) ~ pre, 
ak —Inw(aA) =ad —Inp, — As, +0(1) = (a —a4)A — In p+ + 0(1) 


as A — oo. This and (9.1.11) imply that 


—In fora =a, 
see P+ 


fora>a,. 


If p+ = 0, then the relation w(A) = o(e*5+) as A > 00 similarly implies A(a_) = oo. 
Property (A2) is proved. 


Now let 0 < A+ < 00. If a+ < oo, then necessarily w(A+) < c0, W(A+ +0) = 00 
and y’(A4) < oo (here we mean the left derivative). If we assume that y(A+) = 00, 
then Iny(A,) = 00, (Inw(A))’ > 00 as A fF Aq and a4 = oo, which contradicts the 
assumption a; < oo. Since w(A) = 00 for A > A+, the point A(@), having reached 
the value A+ as a grows, will stop at that point. So, for a > a+, we have 


Ma)=Az, Ala) =ad,—InW(A,) = ACay) +Ag(a—a4). (9.1.15) 


Thus, in this case, fora > a+ the function A(@) remains constant, while A(@) grows 
linearly. Relations (9.1.12) and (9.1.13) remain true. 

If a, = 00, then a < a, for all finite w > w_, and we again deal with the “regu- 
lar” case that we considered earlier (see (9.1.12) and (9.1.13)). Since A(@) does not 
decrease, these relations imply the convexity of A(q@). 

In summary, we can formulate the following property. 


9.1 Laplace’s and Cramér’s Transforms. The Rate Function 247 


(A3) The functions X(a) and A(a) can only be discontinuous at the points s+ 
and under the condition P(é = s4) > 0. These points separate the domain 
(s_, 54) where the function A is finite and continuous (in the extended sense) 
from the domain a ¢ [s_, 54] where A(a) = oo. In the domain [s_, 54] the 
function A is convex. (If we define convexity in the “extended” sense, i.e. 
including infinite values as well, then A is convex on the entire real line.) 
The function A is analytic in the interval (a_, a4). If 44. < © and ay < ©, 
then on the half-line (a4, 00) the function A(q) is linear with slope X4.; at the 
boundary point a4. the continuity of the first derivatives persists. If 44. = 00, 
then A(a) = 00 on (a4, 00). The function A(a) possesses a similar property 
on (—0o, a_). 


If A_ =0, then w_ = m, and A(a) = A(a) = 0 fora < my. 

Indeed, since A(m;) = 0 and w(A) = oo for A < A_ =0 = A(m}), as the value 
of a decreases to a@_ = mj, the point A(q@), having reached the value 0, will stop, 
and A(@) = 0 for a < a_ = my). This and the first identity in (9.1.11) also imply that 
A(a) =0 fora <my,. 

If A_ = A+ =0 (condition [C] is not met), then A(a@) = A(@) = 0 for all a. This 
is obvious, since the value of the function under the sup sign in (9.1.6) equals —oo 
for all A ~ 0. In this case the limit theorems presented in the forthcoming sections 
will be of little substance. 

We will also need the following properties of the function A. 


(A4) Under obvious notational conventions, for independent random variables & 
and n, we have 


Ag4n(@) = es —Inys (A) —Inyy (A)) = inf(Ag (vy) + Ay(a — y)); 


a-—b 
Acé+b(Q) = sup(@A —)b—Inwe (Ac)) = Ag (“—). 
r 


Clearly, inf, in the former relation is attained at the point y at which A¢(y) = 
An(@ — y). If & and 7 are identically distributed then y = a/2 and therefore 


a a a 


(AS) The function A(qa) attains its minimal value 0 at the point a = E& = m. For 
definiteness, assume that a4 > 0. If m, =0 and E|é*| < 00, then 


cs 
2° 


(0) = A(O) = A’(0) =0, A” (0) = -. AOS 
2 


(9.1.16) 
Un the case a— = 0 the right derivatives are intended.) As a | 0, one has the 
representation 


248 9 Large Deviation Probabilities for Sums of Independent Random Variables 


EA 
Aa) = Oa! +o(a*). (9.1.17) 
= 


The semi-invariants y; were defined in (9.1.2) and (9.1.3). 


If the two-sided Cramér condition is satisfied then the series expansion (9.1.17) 
of the function A(q) holds for k = oo. This series is called the Cramér series. 
Verifying properties (A4) and (A5) is not difficult, and is left to the reader. 


(A6) The following inversion formula is valid: for i € (A_, A+), 


In (A) = sup(ad — A(a)). (9.1.18) 


This means that the rate function uniquely determines the Laplace transform w (A) 
and hence the distribution F as well. Formula (9.1.18) also means that subsequent 
double applications of the Legendre transform to the convex function In (A) leads 
to the same original function. 


Proof We denote by T(A) the right-hand side of (9.1.18) and show that T() = 
Inw() for A € (A_, A+). If, in order to find the supremum in (9.1.18), we equate 
to zero the derivative in a of the function under the sup sign, then we will get the 
equation 


A= A'(a) =A(@). (9.1.19) 


Since A(a), a € (a_, a4), is the function inverse to (Iny(A))’ (see (9.1.9)), for 
A € (A_, A+) Eq. (9.1.19) clearly has the solution 


a =a(d):= (Iny(a))’. (9.1.20) 
Taking into account the fact that A(a(A)) = A, we obtain 


T (A) = Aa(a) — A(a(A)), 
T'(A) = a(A) + Aa’ (A) — A(a(A))a’(A) = aA). 


Since a(0) = m, and T (0) = —A(m,) = 0, we have 


x 
ra) = | a(u)du =Inw(a). (9.1.21) 
0 


The assertion is proved, and so is yet another inversion formula (the last equality 
in (9.1.21), which expresses In y(A) as the integral of the function a(A) inverse to 


A(a)). 


(A7) The exponential Chebyshev inequality. For a > mj, we have 


P(S, = an) <e"4@), 


9.1 Laplace’s and Cramér’s Transforms. The Rate Function 249 


Proof If a > mj, then A(@) > 0. For A = A(a) > 0, we have 


w"(A) > E(e*"; S, > an) > e"P(S, > an): 


P(S, > an) < ean (a)+n Iny(A(@)) = eA) | 


We now consider a few examples, where the values of A+, a+, and the functions 
w(A), A(@), A(a) can be calculated in an explicit form. 


Example 9.1.1 Tf & € ®o,1, then 


2 


vase, lAgl=laz|=oo, A@)=a, A@) =. 


Example 9.1.2 For the Bernoulli scheme  € Bp, we have 


Wa) = pe +4, |Az|=00, o,=1, a_=0, my, =E£=p, 


hes Lx 
SD). Aan Len teeta 
pU—a@) p L=p 


A(0)=—In(i—p), A()=—Inp, A(a)=co fora ¢ [0, 1). 


d(@) = In 


Thus the function H(@) = A(a@), which described large deviation probabilities for 
Sp in the local Theorem 5.2.1 for the Bernoulli scheme, is nothing else but the rate 
function. Below, in Sect. 9.3, we will obtain generalisations of Theorem 5.2.1 for 
arbitrary arithmetic distributions. 


Example 9.1.3 For the exponential distribution 'g, we have 


wa)= ee A+ = B, A_ =-O, a=, a_=0, m=5, 
(a) = B- A(a)=aB—1-—Inaep fora>0. 


Example 9.1.4 For the centred Poisson distribution with parameter 6, we have 


wa) =exp{fle*—1—A]}, |Ag|=00, a_=—B, a,=0co, m,=0, 


ia) =n, sewn = 


—a fora>-—f. 


250 9 Large Deviation Probabilities for Sums of Independent Random Variables 


9.2 A Relationship Between Large Deviation Probabilities for 
Sums of Random Variables and Those for Sums of Their 
Cramér Transforms. The Probabilistic Meaning of the Rate 
Function 


9.2.1 A Relationship Between Large Deviation Probabilities for 
Sums of Random Variables and Those for Sums of Their 
Cramér Transforms 


Consider the Cramér transform of F at the point 4 = A(@) for a € [w_,a,] and 
introduce the notation ™ := a,@)), 


n 

Pa (a) 

Sy =e, $ 
i=1 


where are independent copies of £. The distribution F® := Fa) of the 
random variable €) is called the Cramér transform of F with parameter a. The 
random variables & are also called Cramér transforms, but of the original random 
variable €. The relationship between the distributions of S,, and s@ is established 
in the following assertion. 


Theorem 9.2.1 For x =na, a € (a_, a+), and any t > 0, one has 


t 
P(S, €[x,x+1)) =e "4 / e *@2p(5 _ an €dz). (9.2.1) 
0 


Proof The Laplace transform of the distribution of the sum s@ is clearly equal to 


a enol (9.2.2) 
W(A(a)) 

(see (9.1.5)). On the other hand, consider the Cramér transform (S,)((q)) Of Sp at 

the point A(@). Applying (9.1.5) to the distribution of S;,, we obtain 


ecH(SWa@)) = w"(u+a@)) 
yw" (A(a)) 


E 


Since this expression coincides with (9.2.2), the Cramér transform of Sp at the 
point (a) coincides in distribution with the sum s@ of the transforms eel In 
other words, 


P(S, € dvye*@e 


=P(S ed 9.2.3 
Ww" (A(a@)) ( n € v) ( ) 


9.2 Large Deviation of Sums of Random Variables and Cramér Transforms 251 

or, which is the same, 

P(S, Edv) = eh @)utn nee) p(s) € dv) = ge ip Se) € dv) 
n n * 


Integrating this equality in v from x to x +f, letting x := na and making the change 
of variables v — na = z, we get 


x+t 
P(S, €[x,x+1)) =e "4 / ena) p(s) € du) 


x 


t 
= gery e M@Ozp(s@) —ane dz). 
0 


The theorem is proved. 


Since for a € [w_, a+] we have 


_ WG) 


E (a) _ = 
’ wAC@)) 


(see (9.1.11)), one has E(S” — an) =0 and so for t < c/n we have probabilities 
of normal deviations of go — an on the right-hand side of (9.2.1). This allows us to 
reduce the problem on large deviations of Sy, to the problem on normal deviations 


of se. If a > a+, then formula (9.2.1) is still rather useful, as will be shown in 
Sects. 9.4 and 9.5. 


9.2.2 The Probabilistic Meaning of the Rate Function 


In this section we will prove the following assertion, which clarifies the probabilistic 
meaning of the function A(q@). 

Denote by A[a) := [a,a + A) the interval of length A with the left end at 
the point a. The notation A,[a@), where A, depends on n, will have a similar mean- 
ing. 


Theorem 9.2.2 For each fixed a and all sequences An converging to 0 as n > oo 
slowly enough, one has 


1 S, 
A(a) =— lim inP( Lens (). (9.2.4) 
n>on n 
This relation can also be written as 


r(= € An 2) og re, 
n 


252 9 Large Deviation Probabilities for Sums of Independent Random Variables 
Proof of Theorem 9.2.2 First let a € (a_, a4). Then 
(a) _ (a) _ fe 
EE =a, Var & = (Inv (A), 1a) < 00 


and hence, as n — oo and A, — 0 slowly enough (e.g., for A, > n—1/3), by the 
central limit theorem we have 


P(S — an € [0, Ann)) > 1/2. 


Therefore, by Theorem 9.2.1 for t = Ayn, x = an and by the mean value theorem, 
1 
P(S, e[x,xt+ t)) = (; + o()) PS mah 0€(0,1): 


- InP(S, € [x,x +1)) =—A(@) — A(@)OAn + 0(1) = —A(@) + 0(1) 


as n — oo. This proves (9.2.4) for a € (a_, a+). 

The further proof is divided into three stages. 

(1) The upper bound in the general case. Now let @ be arbitrary and |A(@)| < oo. 
By Theorem 9.2.1 for tf =nA,, we have 


S, 
o(= E An(a)) < exp{—nA(a) + max(|A(0)|, |A(@)|)n An}. 
If A, — 0 then 
1 Sn 
lim sup — inP( = € Ay )) <-A(q@). (9.2.5) 
n>oo nN n 


(This inequality can also be obtained from the exponential Chebyshev’s inequal- 
ity (A7).) 

(2) The lower bound in the general case. Let |A(a@)| < o© and |s+| = oo. Intro- 
duce “truncated” random variables “Y)é with the distribution 


(N) — PEEB; IE|<N) _ 
P(E EB) = PUET<) =P(é € B||é| < N) 


and endow all the symbols that correspond to ““)é with the left superscript (NV). 
Then clearly, for each i, 


E(e*; |é| < N) t wQ), P(E] < N) t 1 
as N — ov, so that 


E(e*; |&| < N) 
Myr) = d). 
wa) PUE| <) > Wa) 


9.2 Large Deviation of Sums of Random Variables and Cramér Transforms 253 
The functions ‘Y) A(w) and A(q) are the upper bounds for the concave functions 
ad — In“) w(A) and aa — Iny(A), respectively. Therefore for each a we also have 


convergence ™)A(a) > A(a) as N > oo. 
Further, 


Sn Le 
o( é An(a)) = o( € Anla); |&)|< Nj = testa) 
n n . 


Ns, 


Since s+ = too, one has ‘Yay = +N and, for N large enough, we have a € 
(Ma_, a4). Hence we can apply the first part of the proof of the theorem by 
virtue of which, as A, —> 0, 


1 Ws . 
2 ine(— € Anla)) = — Aa) + 0(1), 
n n 


! is o(= share «)) > —™)A(a) + 0(1) + InP(|é| < N). 


The right-hand side of the last inequality can be made arbitrarily close to — A(@) by 
choosing a suitable NV. Since the left-hand side of this inequality does not depend 
on N, we have 


1 S 
lim inf — inP(= € Anta) ) >-A(a). (9.2.6) 
n>oo n n 


Together with (9.2.5), this proves (9.2.4). 

(3) It remains to remove the restrictions stated at the beginning of stages (1) and 
(2) of the proof, i.e. to consider the cases |A(a@)| = oo and min|si| < oo. These 
two relations are connected with each other since, for instance, the equality A(@) = 
A+ = 00 can only hold if a > a, = s4 < 0 (see property (A2)). For a > s+, 
elation (9.2.4) is evident, since P(S,/n € Ay[a)) = 0 and A(a) = oo. For a = 
ay =s4 and py = P(E =s+), we have, for any A > 0, 


p(= € Ales)) =P(S; =na+) =p". (9.2.7) 


Since in this case A(a_) = — In p+ (see (A2)), the equality (9.2.4) holds true. 

The case A(a~) = A_ = —o0 with s_ > —oo is considered in a similar way. How- 
ever, due to the asymmetry of the interval A[a) with respect to the point a, there 
are small differences. Instead of an equality in (9.2.7) we only have the inequality 


o(> € Anfa >) =P(Sn=na_)= pi, — p-=PE=a_). (9.2.8) 


254 9 Large Deviation Probabilities for Sums of Independent Random Variables 


Therefore we also have to use the exponential Chebyshev’s inequality (see (A7)) 
applying it to —S, for s. =a_ <0: 


S S 
p(= € Anla.)) < »(= ce An) eo ee, (9.2.9) 


Relations (9.2.8), (9.2.9), the equality A(@_) = —In p_, and the right continuity of 
A(qa) at the point a imply (9.2.4) for a = a_. The theorem is proved. 


9.2.3 The Large Deviations Principle 


It is not hard to derive from Theorem 9.2.2 a corollary on the asymptotics of the 
probabilities of S,/n hitting an arbitrary Borel set. Denote by (B) and [B] the 
interior and the closure of B, respectively ((B) is the union of all open intervals 
contained in B). Put 


A(B) := inf A(a). 
aceB 


Theorem 9.2.3 For any Borel set B, the following inequalities hold: 


1 S 
lim inf inP( - B) > A((B)), (9.2.10) 
n>o n n 
. 1 Sh 
lim sup — InP B) <—A([B)). (9.2.11) 
n>oo n 


If A((B)) = A([B]), then the following limit exists: 


1 
lim — In P( : 


n 
no n n 


8) a= ACB), (9.2.12) 


This assertion is called the large deviation principle. It is one of the so-called 
“rough” (“logarithmic”) limit theorems that describe the asymptotic behaviour of 
InP(S,/n € B). It is usually impossible to derive from this assertion the asymp- 
totics of the probability P(S,/n ¢€ B) itself. (In the equality P(S,/n € B) = 
exp{—n A(B) + o(n)}, the term o(m) may grow in absolute value.) 


Proof Without losing generality, we can assume that B C [s_, s,] (since A(@) = co 
outside that domain). 
We first prove (9.2.10). Let ag) be such that 


A((B)) = oe A(a@) = A(ap)) 


(recall that A(@) is continuous on [s_, s,]). Then there exist a sequence of points 
a; and a sequence of intervals (az — dx, a, + 6x), where 6, — 0, lying in (B) and 


9.2 Large Deviation of Sums of Random Variables and Cramér Transforms 255 
converging to the point ag), such that 
A((B)) = inf A( (ax — dk, & + dx). 
Here clearly 
inf A((ax — 3x, or + 5x) = inf A(ax), 
and for a given ¢ > 0, there exists a k = K such that A(ax) < A((B)) + €. 


Since A,[ax) C (a, — bx, a + 5x) for large enough n (here A,,[a,) is from Theo- 
rem 9.2.2), we have by Theorem 9.2.2 that, as n > ov, 


1 1 
ine (= E #) = ine (= (®)) 
nN n n n 
1 S, 
>= P/ “€ (ax — 5x, aK +5x)) 
n 
1 a 
> inP( € Ayla) > —A(ax) + 0(1) 
n n 


> —A((B)) — e+ (1). 


As the left-hand side of this inequality does not depend on ¢, inequality (9.2.10) is 
proved. 

We now prove inequality (9.2.11). Denote by ag) the point at which 
infye[p] A(a) = A(argz]) is attained (this point always belongs to [B] since [B] 
is closed). If A(a;g}) = 0, then the inequality is evident. Now let A(arg}) > 0. By 
convexity of A the equation A(a@) = A(a;g}) can have a second solution a By: As- 
sume it exists and, for definiteness, By < ap). The relation A([B]) = A(argq) 
means that the set [B] does not intersect with (ay Bp B]) and 


Sn Sh Sn / Sn 
P{( —¢B) <P( — €[B]) <P( ~ <ajg))+P(—=aqe)). 9.2.13) 
n n n n 


Moreover, in this case m1 € (ay By B)) and each of the probabilities on the right- 
hand side of (9.2.13) can be bounded using the exponential Chebyshev’s inequality 
(see (A7)) by the value e~"4t4)), This implies (9.2.11). 
If the second solution a7», does not exist, then one of the summands on the right- 
hand side of (9.2.13) equals zero, and we obtain the same result. 
The second assertion of the theorem (Eq. (9.2.12)) is evident. 
The theorem is proved. 


Using Theorem 9.2.3, we can complement Theorem 9.2.2 with the following 
assertion. 


256 9 Large Deviation Probabilities for Sums of Independent Random Variables 


Corollary 9.2.1 The following limit always exists 


lim lim _ inP(=* E A) ) =—A(a). (9.2.14) 
A-0n-oo n n 
Proof Take the set B in Theorem 9.2.3 to be the interval B = A[a). If a ¢ [s_, s+] 
then the assertion is obvious (since both sides of (9.2.14) are equal to —oo). If 
a = sx then (9.2.14) is already proved in (9.2.7), (9.2.8) and (9.2.9). 
It remains to consider points a € (s_,s). For such a, the function A(q@) is con- 
tinuous and @ + A is also a point of continuity of A for A small enough, and hence 


A((B)) = A([B]) > A@) 


as A — 0. Therefore by Theorem 9.2.3 the inner limit in (9.2.14) exists and con- 
verges to —A(a) as A> 0. 
The corollary is proved. 


Note that the assertions of Theorems 9.2.2 and 9.2.3 and their corollaries are 
“universal’”—they contain no restrictions on the distribution F. 


9.3 Integro-Local, Integral and Local Theorems on Large 
Deviation Probabilities in the Cramér Range 


9.3.1 Integro-Local and Integral Theorems 


In this subsection, under the assumption that the Cramér condition A, > 0 is met, 
we will find the asymptotics of probabilities P(S, € A[x)) for scaled deviations a = 
x/n from the so-called Cramér (or regular) range, i.e. for the range a € (a_, a+) 
in which the rate function A(q@) is analytic. 

In the non-lattice case, in addition to the condition 44 > 0, we will assume with- 
out loss of generality that E§ = 0. In this case necessarily 


= Wr (Ay) 
was) 


a_ <0, 4 


>0, A(O) = 0. 


The length A of the interval may depend on nv in some cases. In such cases, we will 
write A, instead of A, as we did earlier. The value 


> v"Q(a)) 5 


= 9.3.1 
me Fae) ° aa 


is clearly equal to Var(&é (@)) (see (9.1.5) and the definition of é @) in Sect. 9.2). 


9.3 Large Deviation Probabilities in the Cramér Range 257 


Theorem 9.3.1 Let A+ > 0, a € [0,a1), & be a non-lattice random variable, 
Eé = 0 and Eé? < 00. If An — 0 slowly enough as n — 00, then 


An 


Ogun 270n 


P(Sp € An[x)) = e "4 (1+0(1), (9.3.2) 


where a = x/n, and, for each fixed a, € (0, a+), the remainder term o(1) is uniform 
ina € [0, a1] for any fixed a, € (0, a+). 
A similar assertion is valid in the case when X_ <0 anda € (a_, OJ. 


Proof The proof is based on Theorems 9.2.1 and 8.7.1A. Since the conditions of 
Theorem 9.2.1 are satisfied, we have 


An 
P(S; € An{x)) = oe e eep(s@ —ane dz). 
0 


As X(a@) < A(ay — €) < 00 and A, > 0, one has e~*% 


z € A,[0) and hence, as n > oo, 


— 1 uniformly in 


P(Sp € An[x)) = e7"4™ P(S™ — an € An[0))(1 + o(1)) (9.3.3) 


uniformly in a € [0, a+ — e]. 

We now show that Theorem 8.7.1A is applicable to the random variables é = 
Ea): That og = o(A(a@)) is bounded away from 0 and from oo for a € [0, a1] is 
evident. (The same is true of all the theorems in this section.) Therefore, it remains 
to verify whether conditions (a) and (b) of Theorem 8.7.1A are met for A = A(a@) € 


[0, Ar], Ay = A(a1) <A4 and yay(t) = we (see (9.1.5)). We have 


2 
WA+it) =WA) +itw'(A) — . w" (A) + o(t) 


as t — 0, where the remainder term is uniform in A if the function w’’(A + iu) is 
uniformly continuous in u. The required uniform continuity can easily be proved 
by imitating the corresponding result for ch.f.s (see property 4 in Sect. 7.1). This 
proves condition (a) in Theorem 8.7.1A with 


Now we will verify condition (b) in Theorem 8.7.1A. Assume the contrary: there 
exists a sequence Ax € [0, A;] such that 


\WAx +it)| 


dy. = sup 
 oi<iti<e: WAR) 


258 9 Large Deviation Probabilities for Sums of Independent Random Variables 


as k — oo. By the uniform continuity of y in that domain, there exist points 
tx € [01, 02] such that, as k + oo, 


Wrz + itg) 
Wak) 


Since the region A € [0,41], |t| € [@1, 82] is compact, there exists a subsequence 
(Ax, th) > (Ag, to) as k’ + oo. Again using the continuity of w, we obtain the 
equality 


IvQo + ito)| = (9.3.4) 
W (Ao) 


which contradicts the non-latticeness of &(,,). Property (b) is proved. 
Thus we can now apply Theorem 8.7.1A to the probability on the right-hand side 


of (9.3.3). Since EE =a and EE)? = Eee , this yields 


— pn Ala) An (=)) 
P(S; E An[x)) =e (“00 +o Te 
An 


Sag 1 a. 
ee OL +0) (9.3.5) 


uniformly in aw € [0, @;] (or in x € [0, a1n]), where the values of 


22 WO) _ a2 
va) 


are bounded away from 0 and from oo. The theorem is proved. 


og =E(E™ 


From Theorem 9.3.1 we can now derive integro-local theorems and integral the- 
orems for fixed or growing A. Since in the normal deviation range (when x is com- 
parable with ./n) we have already obtained such results, to simplify the exposition 
we will consider here large deviations only, when x > ./n or, which is the same, 
a = x/n > 1/,/n. To be more precise, we will assume that there exists a function 
N(n) > o0, N(n) = 0(./n) as n > o, such that x > N(n)./n (a > N(n)/./n). 


Theorem 9.3.2 Let 4. > 0, a € [0,a+), & be non-lattice, EE = 0 and Ez? <0OO. 
Then, for any A> Ag > 0, x => N(n) =o0(./n), N(n) > 00 as n > ov, one has 
eT A@) MOA 
P(S, € A[x)) = l-e““ 1+0()), (9.3.6) 
( it ) SE )( ) 
o(1) being uniform in a = x/n € [N(n)/./n, a1] and A > Ao for each fixed 
a, € (0, a4). 
In particular (for A = oo), 


et A@) 


P(S, > x) = ————_— 
Ore OyrA(a)V 27n 


(1+0(1)). (9.3.7) 


9.3 Large Deviation Probabilities in the Cramér Range 259 


Proof Partition the interval A[x) into subintervals A,[x + kA,), k =0,..., 
A/A, — 1, where A, — 0 and, for simplicity, we assume that M = A/A,, is an 
integer. Then, by Theorem 9.2.1, as A, > 0, 


P(S; € Anlx + kAn)) 
=P(S, €[x,x + (k + 1) An) — P(S, Lx, x +kAn)) 


(k+1) An 
ag 1) : e MMzp(s@ —an € dz) 
kAn 


=e MAMMA B(S) _ an € An[kAn))(1 + 0(1)) (9.3.8) 


uniformly in a € [0, a]. Here, similarly to (9.3.5), by Theorem 8.7.1A we have 


(a) Aj kAn 1 
P(S) ONDA = ia” tale +0 Ti (9.3.9) 


uniformly in k and a. Since 


M- 
P(S, € A[x)) = 5 P( (Sp € Anlx +kA)), 
k=0 


substituting the values (9.3.8) and (9.3.9) into the right-hand side of the last equality, 
we obtain 


P(S E A[x)) = ee > Ane ok An (o(<“+) zs o(t)) 
: “Can Our /n 


—nA(a A-—Ay 
vee eZ g sas) Ary Ve 
Sav/n Jo Oar/n 
(9.3.10) 


After the variable change 1(@)z = u, the right-hand side can be rewritten as 


—nk(a@) (A—Apy)A(a) e i 
Sad(a)J/n Jo 7 (o(— 4) + ow) au (9.3.11) 


where the remainder term o(1) is uniform in a € [0,a;], A > Ao, and u from the 
integration range. Since A(a@) ~ a/o* for small a (see (9.1.12) and (9.1.16)), for 
a > N(n)/./n we have 


Ma) > an (1+0(1)),  oA(a)/n > ae 
Therefore, for any fixed u, one has 
u 1 
(sane) +4 — Fe 


260 9 Large Deviation Probabilities for Sums of Independent Random Variables 


Moreover, @(v) < 1/27 for all v. Hence, by (9.3.10) and (9.3.11), 
et A(@) A(a)A 


OgA(a)v 27n Jo 


et A(@) 


P(S; € A[x)) = e“du(1 + o(1)) 


= 1 
OuAr(a)V/ oma 


uniformly in a € [0,a 1] and A > Apo. Relation (9.3.7) clearly follows from (9.3.6) 
with A = oo. The theorem is proved. 


e *™4)(1 + 0(1)) 


Note that if E|é|‘ < 00 (for A+ > 0 this is a restriction on the rate of decay of the 
left tails P(E < —t), t > 0), then expansion (9.1.17) is valid and, for deviations x = 
o(n) (a = 0(1)) such that nak = grin? <c=const, we can change the exponent 
nA(q) in (9.3.6) and (9.3.7) to 


KAW 
nA(a@) =n) i “ 


—a/! + 0(na*), (9.3.12) 
j=2 


where A‘/) (0) are found in (9.1.16). For k = 3, the foregoing implies the following. 


Corollary 9.3.1 Let 44 > 0, Elé|? < 00, & be non-lattice, EE = 0, Eé* = 07, 
x>J/n and x = o(n2/3) asin — oo. Then 


P(S >) ~ 2 exp| : |~o(- : ) 
ees x/2n Pl one? aJn) 


In the last relation we used the symmetry of the standard normal law, i.e. the 
equality 1 — }(t) = @(—f7). Assertion (9.3.13) shows that in the case A, > 0 and 
E|é|> < 00 the asymptotic equivalence 


(9.3.13) 


x 
P(S;, => x) ~ &®| —— = 
ae (-=a) 
persists outside the range of normal deviations as well, up to the values 
x = 0(n?/3), If EE? = 0 and Eé* < o, then this equivalence holds true up to the 


values x = o(n>/*). For larger x this equivalence, generally speaking, no longer 
holds. 


Proof of Corollary 9.3.1 The first relation in (9.3.13) follows from Theorem 9.3.2 
and (9.3.12). The second follows from the asymptotic equivalence 


8 
ee _ uw e* [2 
e Z2du~ ; 

x 


Xx 


which is easy to establish, using, for example, |’ Hospital’s rule. 


9.3 Large Deviation Probabilities in the Cramér Range 261 


9.3.2 Local Theorems 


In this subsection we will obtain analogues of the local Theorems 8.7.2 and 8.7.3 for 
large deviations in the Cramér range. To simplify the exposition, we will formulate 
the theorem for densities, assuming that the following condition is satisfied: 


[D] The distribution F has a bounded density f (x) such that 


f(x) =e 744°) asx > 00, if Ay < 00; (9.3.14) 


fQ@) < ce’* for any fixedA > 0, c=c(A), ifA~=oo. (9.3.15) 


Since inequalities of the form (9.3.14) and (9.3.15) always hold, by the exponen- 
tial Chebyshev inequality, for the right tails 


hws ; Fide. 


condition [D] is not too restrictive. It only eliminates sharp “bursts” of f(x) as 
xX > 00. 
Denote by f;,(x) the density of the distribution of S;,. 


Theorem 9.3.3 Let 
Ef=0, Eé2<0o, A4>0, a=~€[0,a4), 
n 
and condition [D] be met. Then 
e TA) 
Ogv 2 Nn 


where the remainder term o(1) is uniform in a € [0, a1] for any fixed a, € (0, a+). 


fn(x) = (1+ 0(1)), 


Proof The proof is based on Theorems 9.2.1 and 8.7.2A. Denote by Ff the 
density of the distribution of S. Relation (9.2.3) implies that, for x = an, a € 
[a_, a+], we have 


fax) = ey" (A) £ @) =e ™™ £ (x). (9.3.16) 


Since EE = a, we see that E(S — x) =0 and the density value FAGae) 
coincides with the density of the distribution of the sum s© — an at the point 0. In 
order to use Theorems 8.7.1A and 8.7.2A, we have to verify conditions (a) and (b) 
for 02 = co in these theorems and also the uniform boundedness in a € [0, a] of 


i lacy (t)|" dt (9.3.17) 


262 9 Large Deviation Probabilities for Sums of Independent Random Variables 


for some integer m > 1, where @(q)) 18 the ch.f. of £@) (the uniform version of 
condition (c) in Theorem 8.7.2). By condition [D] the density 


eo" Fv) 


(a) (4) — 
P= Ge) 


in bounded uniformly in a € [0,a;] (for such @ one has A(q) € [0,A1], Ai = 
A(a1) < A+). Hence the integral 


: (f@ (wv) dv 


is also uniformly bounded, and so, by virtue of Parseval’s identity (see Sect. 7.2), is 
the integral 


[lea Par. 


This means that the required uniform boundedness of integral (9.3.17) is proved 
for m = 2. 

Conditions (a) and (b) for 62 < co were verified in the proof of Theorem 9.3.1. It 
remains to extend the verification of condition (b) to the case 02 = oo. This can be 
done by following an argument very similar to the one used in the proof of Theo- 
rem 9.3.1 in the case of finite 02. Let 02 = oo. If we assume that there exist sequences 
Ax € (0, A+,¢] and |t%| > 6) such that 


IW(Ag + it) | 
W(Ak) 


then, by compactness of [0,A+,.], there will exist sequences he > ho € [0, A+, €] 
and ft, such that 


WA, + ith)| 
wo) 


But by virtue of condition [D] the family of functions y(A + if), t € R, is equicon- 
tinuous in A € [0, A+,.]. Therefore, along with (9.3.18), we also have convergence 


(9.3.18) 


ho + it; 
BROT sd, eit: 
Wo) 
which contradicts the inequality 
IWAo+it)| 
Ina, Wo) 


that follows from the existence of density. 


9.3 Large Deviation Probabilities in the Cramér Range 263 


Thus property (b) is proved for 62 = oo, and we can use Theorem 8.7.2A, which 
implies that 


fr) = (1+ 0(1)). 


1 
o(A(a))V 20 n 
This, together with (9.3.16), proves Theorem 9.3.3. 


Remark 9.3.1 We can see from the proof that, in Theorem 9.3.3, as a more gen- 
eral condition instead of condition [D] one could also consider the integrability of 
w”" (A + it) for any fixed A € [0, A], Ay < A4, or condition [D] imposed on S,, for 
some m > 1. 


For arithmetic distributions we cannot assume without loss of generality that 
m, = EKé = 0, but that does not change much in the formulations of the assertions. 
If A+ > 0, then w+ = W’(A+)/W(A+) > m and the scaled deviations a = x/n for 
the Cramér range must lie in the region [m,, a+). 


Theorem 9.3.4 Let A. > 0, E&* < 00 and the distribution of & be arithmetic. Then, 
for integer x, 

en A(a) 

P(S;, =x) = oe 
where the remainder term o(1) is uniform ina = x/n € [m 1, a] for any fixed a, € 


(m 1; a+). 
A similar assertion is valid in the case when h_ <O anda € (a_,m,]. 


(1+ 0(1)), 


Proof The proof does not differ much from that of Theorem 9.3.1. By (9.2.3), 
PS, = Se ea PS Sa) See sa), 


where Ege =a for a € [m,, a+). In order to compute P(s” = x) we have to 
use Theorem 8.7.3A. The verification of conditions (a) and (b) of Theorem 8.7.1A, 
which are assumed to hold in Theorem 8.7.3A, is done in the same way as in the 
proof of Theorem 9.3.1, the only difference being that relation (9.3.4) for fo € [01, 77] 
will contradict the arithmeticity of the distribution of €. Since a(A(a)) = E& @—=q, 
by Theorem 8.7.3A we have 


)=—— (1+ 00) 
. 7 Ogun 27n 


uniformly in aw = x/n € [m, a]. The theorem is proved. 


264 9 Large Deviation Probabilities for Sums of Independent Random Variables 


9.4 Integro-Local Theorems at the Boundary of the Cramér 
Range 


9.4.1 Introduction 


In this section we again assume that Cramér’s condition A > 0 is met. If a, = 00 
then the theorems of Sect. 9.3 describe the large deviation probabilities for any 
a =x/n. But if a; < o then the approaches of Sect. 9.3 do not enable one to 
find the asymptotics of probabilities of large deviations of S, for scaled deviations 
a =x/n in the vicinity of the point a+. 

In this section we consider the case a < oo. If in this case A+ = oo, then, by 
property (A2)(i), we have a+ = s; = sup{t : F(t) > O}, and therefore the ran- 
dom variables & are bounded from above by the value a,, P(S, > x) = 0 for 
a=x/n>a+. We will not consider this case in what follows. Thus we will study 
the case a4 < 00, Ay <0. 

In the present and the next sections, we will confine ourselves to considering 
integro-local theorems in the non-lattice case with A = A, — 0 since, as we saw in 
the previous section, local theorems differ from the integro-local theorems only in 
that they are simpler. As in Sect. 9.3, the integral theorems can be easily obtained 
from the integro-local theorems. 


9.4.2 The Probabilities of Large Deviations of Sy; in an 
o(n)-Vicinity of the Point a4n; the Case wy" (X44) < 00 


In this subsection we will study the asymptotics of P(S, € A[x)), x =an, when a 
lies in the vicinity of the point aw; < oo and, moreover, yr” (A+) < 00. (The case of 
distributions F, for which A+ < 00, a} < oo and y”(A,) < 00, will be illustrated 
later, in Lemma 9.4.1.) Under the above-mentioned conditions, the Cramér trans- 
form F(,) is well defined at the point 14, and the random variable & (@+) with the 
distribution Fi) has mean a and a finite variance: 


W's) (a@4)\) _ 2 _ WAL) 2 
FO) ee MAE) = 0a = ay 


EE @+) = (9.4.1) 


(cf. (9.3.1)). 
Theorem 9.4.1 Let & be a non-lattice random variable, 
A+ € (0, 0), Ww" (Aa) <0, y=x-—ayn=o(n). 


If An — 0 slowly enough as n — o© then 


An —nA(ay)—Ayy y? 
P(S, € An[x)) = ——L—e HM ( exp) -+—} +0(1) ), 
wn Oo, n 


Ou, 2 


9.4 Integro-Local Theorems at the Boundary of the Cramér Range 265 


where 


7 oe VW" Ay) 2 


iat, oo = Ws) mare 


and the remainder term o(1) is uniform in y. 


’ 


Proof As in the proof of Theorem 9.3.1, we use the Cramér transform, but now at 
the fixed point A+, so there will be no triangular array scheme when analysing the 
sums se . In this case the following analogue of Theorem 9.2.1 holds true. 


Theorem 9.2.1A Let 44 € (0,00), a1 < 00 and y= x — na. Then, for x =na 
and any fixed A > 0, the following representation is valid: 


A 
P(Sy € A[x)) =e" A@H)A4y / e4+2p(s") — an € dz). (9.4.2) 
0 


Proof of Theorem 9.2.1A repeats that of Theorem 9.2.1 the only difference being 
that, as was already noted, the Cramér transform is now applied at the fixed point A 
which does not depend on a = x/n. In this case, by (9.2.3), 


P(S, € dv) aig train p( sey c dv) 7 nA(a+)+A+(apn psec edu). 


Integrating this equality in v from x to x + A, changing the variable v = x + z 
(x = nq), and noting that a,n — v = —y — z, we obtain (9.4.2). 
The theorem is proved. 


Let us return to the proof of Theorem 9.4.1. Assuming that A = A, — 0, we 
obtain, by Theorem 9.2.1A, that 


P(Sy € An[x)) = e~"400-*44Y (SY) — an € Anly))(1+0()). @.4.3) 


By virtue of (9.4.1), we can apply Theorem 8.7.1 to evaluate the probability on 
the right-hand side of (9.4.3). This theorem implies that, as A, — 0 slowly enough, 


(a4) An y 1 
Bee ae) (a) - (<a) 


Ta aalt(z) 
= ex oO 
Oo, V20N y og,n Jn 


uniformly in y. This, together with (9.4.3), proves Theorem 9.4.1. 


266 9 Large Deviation Probabilities for Sums of Independent Random Variables 


9.4.3 The Class of Distributions ER. The Probability of Large 
Deviations of Sy, in an o(n)-Vicinity of the Point ain for 
Distributions F from the Class ER in Case w" (X4.)=00 


When studying the asymptotics of P(S, > an) (or P(S,, € A[an))) in the case where 
w"(A4) = 00 and a is in the vicinity of the point a; < oo, we have to impose 
additional conditions on the distribution F similarly to what was done in Sect. 8.8 
when studying convergence to stable laws. 

To formulate these additional conditions it will be convenient to introduce certain 
classes of distributions. If 44 < 00, then it is natural to represent the right tails F(t) 
as 


Fiu(t)=e*+'V(t), (9.4.4) 
where, by the exponential Chebyshev inequality, V(t) =e? as t > oo. 
Definition 9.4.1 We will say that the distribution F of a random variable & (or the 


random variable & itself) belongs to the class R if its right tail F+(t) is a regularly 
varying function, i.e. can be represented as 


Fi(t)=t PL), (9.4.5) 
where L is a slowly varying function as t > oo (see also Sect. 8.8 and Appendix 6). 


We will say that the distribution F (or the random variable €) belongs to the 
class ER if, in the representation (9.4.4), the function V is regularly varying (which 
will also be denoted as V € 7). 

Distributions from the class 7 have already appeared in Sect. 8.8. 

The following assertion explains which distributions from E7 correspond to the 
cases a4 = 00, a4 < 00, WwW” (Ay) = 00 and W"(A4) < 00. 


Lemma 9.4.1 Let F € ER. For a to be finite it is necessary and sufficient that 


lo) 
/ tV(t)dt <o. 
1 


For wy" (A4,) to be finite, it is necessary and sufficient that 


CO 
/ t?V(t) dt < oo. 


1 


The assertion of the lemma means that wa, < oo if 6 > 2 in the representation 
V(t) =t-* L(t), where L is ans.vf. and a, =ooif B <2. For B = 2, the finiteness 
of a+ is equivalent to the finiteness of fj ic t~!L(t) dt. The same is true for the 
finiteness of y(A,). 


9.4 Integro-Local Theorems at the Boundary of the Cramér Range 267 


Proof of Lemma 9.4.1 We first prove the assertion concerning a4. Since 


2 w'(A4) 
W(As)’ 


we have to estimate the values of w/(A,) and y(A,). The finiteness of w’(A+) is 
equivalent to that of 


a+ 


= fo revaro= [ia.vod—avo), (9.4.6) 
1 1 


where, for V(t) = o(1/t), 


-[oravo= vant fo V(t) dt. 
1 1 


Hence the finiteness of the integral on the left-hand side of (9.4.6) is equivalent to 
that of the sum 


nef rvnars fo Vq@)dt 
1 1 


or, which is the same, to the finiteness of the integral i 1 tV(t) dt. Similarly we see 
that the finiteness of w(A+) is equivalent to that of J i V(t) dt. This implies the 
assertion of the lemma in the case i V(t) dt < ov, where one has V(t) = o(1/f). 
If he V(t) dt = ow, then WA) = ~, Inw(A) > ew as A f A+ and hence ay = 
lim, y,, (in w(A))! = 00. 

The assertion concerning w’’(A+) can be proved in exactly the same way. The 
lemma is proved. 


The lemma implies the following: 


(a) If 8 <2 or B =2 and i. t—! L(t) = 00, then a4 = oo and the theorems of the 
previous section are applicable to P(S, > x). 

(b) If B > 3 or B =3 and f° rt-!L(t)dt < ow, then ay < 00, w"(A4) < 00 and 
we can apply Theorem 9.4.1. 


It remains to consider the case 


(c) B € [2, 3], where the integral i ( L(t) dt is finite for 6 = 2 and is infinite for 
p=3. 
It is obvious that in case (c) we have w+ < oo and wy” (A+) = 00. 
Put 
A+t V(t _pf{l 
BVO -bavsevs (2). 
Bw(A+) n 


where Ve ) (1/n) is the value of the function inverse to V+ at the point 1/n. 


Vi) := 


268 9 Large Deviation Probabilities for Sums of Independent Random Variables 


Theorem 9.4.2 Let & be a non-lattice random variable, F € ER and condition (c) 
hold. If An — 0 slowly enough as n > ov, then, for y = x—a4in = o(n), 


An —nA(a4)—Ayy _ 
P(Sy € An[x)) = oor" (2) + a(t), 


where f'8—!-)) is the density of the stable law F(g—1,1) with parameters B — 1,1, 
and the remainder term o(1) is uniform in y. 


We will see from the proof of the theorem that studying the probabilities of large 
deviations in the case where a+ < oo and W”’(A+) = © is basically impossible 
outside the class E7, since it is impossible to find theorems on the limiting distribu- 
tion of S, in the case Var(&) = oo without the conditions [R,,,] of Sect. 8.8 being 
satisfied. 


Proof of Theorem 9.4.2 Condition (c) implies that a, = E&@+) < oo and 
Var(é+)) = 00. We will use Theorem 9.2.1A. For A, —> 0 slowly enough we will 
obtain, as in the proof of Theorem 9.4.1, that relation (9.4.3) holds true. But now, 
in contrast to Theorem 9.4.1, in order to calculate the probability on the right-hand 
side of (9.4.3), we have to employ the integro-local Theorem 8.8.3 on convergence 
to a stable law. In our case, by the properties of r.v.f.s, one has 


P(ECH > t) — 1 [Po evtarsew = ! [ Gsverdu - dV (u)) 
= WA) Jt WAL) Sr 
__ At pti ve 
= SPL ~ MH (0.4.7) 


where L(t) ~ L(t) is a slowly varying function. Moreover, the left tail of the distri- 
bution F+) decays at least exponentially fast. By virtue of the results of Sect. 8.8, 
this means that, for b(n) = ve dy n), we have convergence of the distributions 


(0) ay to the stable law Fg_;,; with parameters 6 — 1 € [1, 2] and 1. It re- 
mains to use representation (9.4.3) and Theorem 8.8.3 which implies that, provided 


A, — 0 slowly enough, one has 


(a4) _ _ An -(p-1,(_Y aa) 
P(S; ane Anly)) = b(n) f (3) _ (a 


uniformly in y. The theorem is proved. 


Theorem 9.4.2 concludes the study of probabilities of large deviations of S,/n 
in the vicinity of the point a, for distributions from the class ER. 


9.5 Integral and Integro-Local Theorems on Large Deviation Probabilities 269 


9.4.4 On the Large Deviation Probabilities in the Range a > a+ 
for Distributions from the Class ER 


Now assume that the deviations x of S,, are such that a = x/n > a4, and y=x — 
an grows fast enough (faster than ./n under the conditions of Theorem 9.4.1 and 
faster than b(n) under the conditions of Theorem 9.4.2). Then, for the probability 


P(S@ — ayn € Anly)), (9.4.8) 


the deviations y (see representation (9.4.3)) will belong to the zone of large devi- 
ations, so applying Theorems 8.7.1 and 8.8.3 to evaluate such probabilities does 
not make much sense. Relation (9.4.7) implies that, in the case F € ER, we have 
F°+) € R. Therefore, we will know the asymptotics of the probability (9.4.8) (and 
hence also of the probability P(S, € A,[x)), see (9.4.3)) if we obtain integro-local 
theorems for the probabilities of large deviations of the sums S,, in the case where 
the summands belong to the class 7. Such theorems are also of independent inter- 
est in the present chapter, and the next section will be devoted to them. After that, 
in Sect. 9.6 we will return to the problem on large deviation probabilities in the 
class ER mentioned in the title of this section. 


9.5 Integral and Integro-Local Theorems on Large Deviation 
Probabilities for Sums S, when the Cramér Condition Is not 
Met 


If Eé = 0 and the right-side Cramér condition is not met (A; = 0), then the rate 
function A(q@) degenerates on the right semiaxis: A(a) = A(a) = 0 for a > 0, and 
the results of Sects. 9.1-9.4 on the probabilities of large deviations of S,, are of little 
substance. In this case, in order to find the asymptotics of P(S, > x) and P(S, € 
A[x)), we need completely different approaches, while finding these asymptotics is 
only possible under additional conditions on the behaviour of the tail F(t) of the 
distribution F,, similarly to what happened in Sect. 8.8 when studying convergence 
to stable laws. 

The above-mentioned additional conditions consist of the assumption that the tail 
F(t) behaves regularly enough. In this section we will assume that Fy.(t) = V(t) € 
R, where R is the class of regularly varying functions introduced in the previous 
section (see also Appendix 6). To make the exposition more homogeneous, we will 
confine ourselves to the case 6 > 2, Var(€) < oo, where —8 is the power exponent 
in the function V € R (see (9.4.5)). Studying the case 6 € [1, 2] (Var(é) = oo) does 
not differ much from the exposition below, but it would significantly increase the 
volume of the exposition and complicate the text, and therefore is omitted. Results 
for the case 6 € (0, 2] can be found in [8, Chap. 3]. 


270 9 Large Deviation Probabilities for Sums of Independent Random Variables 


9.5.1 Integral Theorems 


Integral theorems for probabilities of large deviations of S, and maxima S, = 
maxz<y, Sz in the case EE = 0, Var(€) < 00, Fe R, B > 2, follow immediately from 
the bounds obtained in Appendix 8. In particular, Corollaries A8.2.1 and A8.3.1 of 
Appendix 8 imply the following result. 


Theorem 9.5.1 Let EE = 0, Var(é) < 00, Fe R and B > 2. Then, for x > J/nInn, 
P(S, >x) ~ P(S, > x) ~nV(x). (9.5.1) 


Under an additional condition [Do] to be introduced below, the assertion of this 
theorem will also follow from the integro-local Theorem 9.5.2 (see below). 

Comparing Theorem 9.5.1 with the results of Sects. 9.2—9.4 shows that the nature 
of the large deviation probabilities is completely different here. Under the Cramér 
condition and for a = x/n € (0, a+), the large deviations of S,, are, roughly speak- 
ing, “equally contributed to by all the summands” &, k <n. This is confirmed by 
the fact that, for a fixed a, the limiting conditional distribution of &, k <n, given 
that S, € A[x) (or S, > x) for x =an, A=1, as n— oc coincides with the distri- 
bution F of the random variable &. The reader can verify this himself/herself 
using Theorem 9.3.2. In other words, the conditions {S, € A[x)} (or {S, > x}), 
x =an, change equally (from F to F)) the distributions of all the summands. 

However, if the Cramér condition is not met, then under the conditions of The- 
orem 9.5.1 the large deviations of S,, are essentially due to one large (comparable 
with x) jump. This is seen from the fact that the value of nV (x) on the right-hand 
side of (9.5.1) is nothing else but the main term of the asymptotics for P(E, > x), 
where &,, = maxz<, &. Indeed, if nV (x) > 0 then 


PE, <x) =(1—V(a))" =1—nV(x) + O((nV))’), 


PE, > x) =nV(x) + O((nV(a))*) ~nV(Q). 


In other words, the probabilities of large deviations of S,, S, and &,, are asymp- 
totically the same. The fact that the probabilities of the events {€; => y} for y~ x 
play the determining role in finding the asymptotics of P(S, > x) can easily be 
discovered in the bounds from Appendix 8. 

Thus, while the asymptotics of P(S, > x) for x =an > ./n in the Cramér case 
is determined by “the whole distribution F” (as the rate function A(@) depends on 
the “the whole distribution F’’), these asymptotics in the case F € 7 are determined 
by the right tail F,(t) = V(t) only and do not depend on the “remaining part” of 
the distribution F (for the fixed value of EE = 0). 


9.5 Integral and Integro-Local Theorems on Large Deviation Probabilities 271 


9.5.2 Integro-Local Theorems 


In this section we will study the asymptotics of P(S, € A[x)) in the case where 
EE = 0, Var £7 < 00, Fer, B >2, x >vnInn. (9.5.2) 


These asymptotics are of independent interest and are also useful, for example, in 
finding the asymptotics of integrals of type E(g(S,); S, > x) for x >> VnInn for 
a wide class of functions g. As was already noted (see Subsection 4.4), in the next 
section we will use the results from the present section to obtain integro-local theo- 
rems under the Cramér condition (for summands from the class E7) for deviations 
outside the Cramér zone. 

In order to obtain integro-local theorems in this section, we will need additional 
conditions. Besides condition F € , we will also assume that the following holds: 


Condition [Do] For each fixed A, as t > 0, 


V(t) — Vit + A) = v(t)(A+0(1)), ig -— 


It is clear that if the function L(t) in representation (9.4.5) (or the function V (¢)) 
is differentiable for t large enough and L’(t) = o(L(t)/t) as t > oo (all sufficiently 
smooth s.v.f.s possess this property; cf. e.g., polynomials of Inf etc.), then condi- 
tion [Do] will be satisfied, and the derivative —V’(t) ~ v(t) will play the role of the 
function v(t). 


Theorem 9.5.2 Let conditions (9.5.2) and [Do] be met. Then 
BV(x) 


x 


P(S;, € A[x)) = Anv(x)(1+0(1)), v(x) = 


where the remainder term o(1) is uniform in x > NVnI nn and A € [A, A2] for 
any fixed Az > A, > 0 and any fixed sequence Noo. 


Note that in Theorems 9.5.1 and 9.5.2 we do not assume that n—>oo. The as- 
sumption that x — oo is contained in (9.5.2). 


Proof For y < x, introduce the events 
Gn={S€Alx)},  By:= {Ej <y} = B= ) By. (9.5.3) 


Then 


P(G,) = P(G, B) + P(G,B), B= U Bj, (9.5.4) 


272 9 Large Deviation Probabilities for Sums of Independent Random Variables 


where 
n n 
DI PGnBj) = P(GnB) =D P(GnBj)— DP PGnBiBj) 9.55) 
j=l j=l i<j<n 


(see property 8 in Sect. 9.2.2). 

The proof is divided into three stages: the bounding of P(G,B), that of 
P(G,B; Bj), i ~ j, and the evaluation of P(G,,B;). 

(1) A bound on P(G, B). We will make use of the rough inequality 


P(G,B) < PS; = x; B) (9.5.6) 


and Theorem A8.2.1 of Appendix 8 which implies that, for x = ry with a fixed 
r>2,anyé>0,andx > NV/nInn, N > o~, we have 


P(S, > x; B) <(nVQ))"®. (9.5.7) 

Here we can always choose r such that 
(nV(x))"? Kn Av(a) (9.5.8) 
for x >> ./n. Indeed, putting n := x* and comparing the powers of x on the right- 


hand and left-hand sides of (9.5.8), we obtain that for (9.5.8) to hold it suffices to 
choose r such that 


(2— B)(r —8) < 1-8, 


which is equivalent, for 6 > 2, to the inequality. 


For such r, we will have that, by (9.5.6)-(9.5.8), 
P(G,B) = 0(nAv(x)). O59) 
Since r — 5 > 1, we see that, for n < x”, relations (9.5.8) and (9.5.9) will hold true 


all the more. a a 
(2) A bound for P(G,, B; B ;). It is sufficient to bound P(G, By,-1 By). Set 


= a Hy :={v: v<(1—kd)x + A}, k=1,2. 
r 


Then 


P(G,By-1By) = / P(S,-9 €dz) 
Hy 


9.5 Integral and Integro-Local Theorems on Large Deviation Probabilities 273 


ef P(z+éedv,é > dx)P(u+é € A[x),é > dx). 
Hy 
(9.5.10) 


Since in the domain H; we have x — v > 6x — A, the last factor on the right-hand 
side of (9.5.10) has, by condition [Do], the form Av(x — v)(1 + 0(1)) < cAv(x) as 
xX — oo, so the integral over H; in (9.5.10), for x large enough, does not exceed 


cAv(x)P(Z+é € M3 & > bx) < cAv(x)V (6x). 


The integral over the domain H in (9.5.10) evidently allows a similar bound. Since 
nV (x) > 0, we obtain that 


> P(G,, B; Bj) < cy An? v(x) V (x) = o(Anv(x)). (9.5.11) 


i<j<n 


(3) The evaluation of P(G,Bj) is based on the relation 
P(GnBn) = [ P(Sp-1 € dz) P(E € A[x —z),€=> 5x) 
1 
< P(Sp—1 € dz) P(E € A[x — z)) 
1 
=Af P(Sp-1 € dz)u(x — z)(1+ 0(1)), (9.5.12) 
1 


which yields 


P(Gn Bn) < AE[v(x — Sn—-1); Sn-1 < (1 — 8)x + A](1 + 0(1)) 
= Av(x)(1+ 0(1)). (9.5.13) 
The last relation is valid for x >> ./n, since, by Chebyshev’s inequality, E[v(x — 


Sn—1); |[Sn—1]| < Mn] ~ v(x) as M > 00, M./n = 0(x) and, moreover, the fol- 
lowing evident bounds hold: 


E[v(% — Sn—1); Sn-1 € (Mv/n, (1 — 8)x + A)] = o(v(x)), 


E[v(e — Sy—1)s Sn—1 € (00, —M/n)] = o(v(x)) 


as M > oo. 
Similarly, by (virtue of (9.5.12)) we get 


(1—8)x 
P(GnBn) > / P(Sy—1 € dz) P(E € Alx — z)) ~ Av(a). (9.5.14) 


[ee 


274 9 Large Deviation Probabilities for Sums of Independent Random Variables 


From (9.5.13) and (9.5.14) we obtain that 
P(G,Bn) = Av(x)(1 + 0())). 

This, together with (9.5.4), (9.5.9) and (9.5.11), yields the representation 
P(G,) = Anv(x)(1 + 0(1)). 


The required uniformity of the term o(1) clearly follows from the preceding argu- 
ment. The theorem is proved. 


Theorem 9.5.2 implies the following 


Corollary 9.5.1 Let the conditions of Theorem 9.5.2 be satisfied. Then there exists 
a fixed sequence An converging to zero slowly enough as N — o such that the 
assertion of Theorem 9.5.2 remains true when the segment [A , A2] is replaced in 
it with [Ayn, Ag]. 


9.6 Integro-Local Theorems on the Probabilities of Large 
Deviations of S,, Outside the Cramér Range (Under the 
Cramér Condition) 


We return to the case where the Cramér condition is met. In Sects. 9.3 and 9.4 
we obtained integro-local theorems for deviations inside and on the boundary of 
the Cramér range. It remains to study the asymptotics of P(S, € A[x)) outside 
the Cramér range, i.e. fora = x/n > a. Preliminary observations concerning this 
problem were made in Sect. 9.4.4 where it was reduced to integro-local theorems 
for the sums S,, when Cramér’s condition is not satisfied. Recall that in that case we 
had to restrict ourselves to considering distributions from the class ER defined in 
Sect. 9.4.3 (see (9.4.4)). 


Theorem 9.6.1 Let Fe ER, B > 3,a@=x/n>az and y=x—ayn> Jn. Then 
there exists a fixed sequence Ay converging to zero slowly enough as N — oo, such 
that 


P(S, € Ay[x)) =e "40 nAyvz(y)(1 + 0(D) 
=e "On Ayvs(y)(1 +01), 


where v+(y) =A+V(0))/W(A4), the remainder term o(1) is uniform in x and n such 
that y >> NVnInn, N being an arbitrary fixed sequence tending to oo. 


Proof By Theorem 9.2.1A there exists a sequence Ay converging to zero slowly 
enough such that (cf. (9.4.3)) 


P(S, € Ay[x)) =e "40H 4 p(s? —a ,n€ Ay[y)). (9.6.1) 


9.6 Large Deviations of S, Outside the Cramér Range 275 
Since by properties (A1) and (A2) the function A(q) is linear for a > a+: 

A(@) = A(a+) + (@— ay )AL, 
the exponent in (9.6.1) can be rewritten as 

—nA(az) -—Azyy =—nA(@). 


The right tail of the distribution of € (@+) has the form (see (9.4.7)) 


PES > 1) = [ V(u)du+ Vit). 
t 


A+ 
Was) 
By the properties of regularly varying functions (see Appendix 6), 

Vit)-Vit-w= o((V(t)) 


as t — oo for any fixed u. This implies that condition [Do] of Sect. 9.5 is satisfied 
for the distribution of €+), 

This means that, in order to calculate the probability on the right-hand side 
of (9.6.1), we can use Theorem 9.5.2 and Corollary 9.5.1, by virtue of which, as 
Aw — O slowly enough, 


P(Sy? — apne Ayly)) =nAyvy(y)(1 + 0())), 
where the remainder term o(1) is uniform in all x and n such that y >> NVnInn, 


N> oo. 
The theorem is proved. 


Since P(S, € Ay[x)) decreases exponentially fast as x (or y) grows (note the 
factor e~*+ in (9.6.1)), Theorem 9.6.1 immediately implies the following integral 
theorem. 


Corollary 9.6.1 Under the conditions of Theorem 9.6.1, 


P(S, > x) =e VO) 


1+o(1 
> Fay tem). 


Proof Represent the probability P(S, > x) as the sum 


[o.e) 
P(Sn =x) =) P(Sn € Anix +kAy)) 
k=0 


a 
~ eA 2A@ a : > AyV(y t+ Anke ANF, 


276 9 Large Deviation Probabilities for Sums of Independent Random Variables 


Here the series on the right-hand side is asymptotically equivalent, as N — oo, to 


the integral 
CO 
V 
vo f eH dt= YO) 
0 A+ 


The corollary is proved. 


Note that a similar corollary (i.e. the integral theorem) can be obtained under the 
conditions of Theorem 9.4.2 as well. 

In the range of deviations a = = > a+, only the case F € ER, B € [2, 3] (recall 
that a = oo for 6 < 2) has not been considered in this text. As we have already 
said, it could also be considered, but that would significantly increase the length and 
complexity of the exposition. Results dealing with this case can be found in [8]; one 
can also find there a more complete study of large deviation probabilities. 


Chapter 10 
Renewal Processes 


Abstract This is the first chapter in the book to deal with random processes in con- 
tinuous time, namely, with the so-called renewal processes. Section 10.1 establishes 
the basic terminology and proves the integral renewal theorem in the case of non- 
identically distributed random variables. The classical Key Renewal Theorem in the 
arithmetic case is proved in Sect. 10.2, including its extension to the case where 
random variables can assume negative values. The limiting behaviour of the excess 
and defect of a random walk at a growing level is established in Sect. 10.3. Then 
these results are extended to the non-arithmetic case in Sect. 10.4. Section 10.5 is 
devoted to the Law of Large Numbers and the Central Limit Theorem for renewal 
processes. It also contains the proofs of these laws for the maxima of sums of in- 
dependent non-identically distributed random variables that can take values of both 
signs, and a local limit theorem for the first hitting time of a growing level. The chap- 
ter ends with Sect. 10.6 introducing generalised (compound) renewal processes and 
establishing for them the Central Limit Theorem, in both integral and integro-local 
forms. 


10.1 Renewal Processes. Renewal Functions 


10.1.1 Introduction 


The sequence of sums of random variables {5S;,}, considered in previous chapters, is 
often called a random walk. It can be considered as the simplest random process in 
discrete time n. The further study of such processes is contained in Chaps. 11, 12 
and 20. 

In this chapter we consider the simplest processes in continuous time t that are 
also entirely determined by a sequence of independent random variables and do 
not require, for their construction, any special structures (in the general case such 
constructions will be needed; see Chap. 18). 

Let Tt, {tj}F25 be a sequence of independent random variables given on a prob- 
ability space (S2, §, P) (here we change our conventional notations §; to t; for rea- 
sons that will become clear in Sect. 10.6, where €; appear again). For the random 
variables T2, 73,... we will usually assume some homogeneity property: proximity 


A.A. Borovkov, Probability Theory, Universitext, 277 
DOI 10.1007/978-1-4471-5201-9_10, © Springer-Verlag London 2013 


278 10 Renewal Processes 


of the expectations or identical distributions. The random variable t; can be arbi- 
trary. 


Definition 10.1.1 A renewal process is a collection of random variables n(t) de- 
pending on a parameter ¢ and defined on (2, ¥, P) by the equality 


n(t):= minfk>0:T%>t}, t>0, (10.1.1) 


where 


The variables n(t) are not completely defined yet. We do not know what n(t) is 
for w such that the level ¢ is never reached by the sequence of sums 7;. In that case 
it is natural to put 


n(t):= 00 ifall T <t. (10.1.2) 


Clearly, n(t) is a stopping time (see Sect. 4.4). 

Usually the random variables 12, 3, ... are assumed to be identically distributed 
with a finite expectation. The distribution of the random variable tT; can be arbitrary. 

We assume first that all the random variables t; are positive. Then definition 
(10.1.1) allows us to consider 7(t) as a random function that can be described 
as follows. If we plot the points Ty = 0, 7, 72,... on the real line, then one has 
n(t) = O on the semi-axis (—oo, 0), n(f) = 1 on the semi-interval [0, 7)), n(t) = 2 
on the semi-interval [7 , T>) and so on. 

The sequence {7;}?~.9 is also often called a renewal process. Sometimes we will 
call the sequence {7} a random walk. The quantity n(t) can also be called the first 
passage time of the level t by the random walk {T;}7° 9. 

If, based on the sequence {7;}, we construct a random walk T(x) in continuous 
time: 


T(x):=T forxe[k,k+1), k>0, 
then the renewal process n(t) will be the generalised inverse of T (x): 
n(t) = inf {x >0:T(x) > t}. 


The term “renewal process” is related to the fact that the function y(t) and the 
sequence {7;} are often used to describe the operation of various physical devices 
comprising replaceable components. If, say, t; is the failure-free operating time 
of such a component, after which the latter requires either replacement or repair 

“renewal”, which is supposed to happen immediately), then 7; will denote the time 
of the k-th “renewal” of the component, while n(t) will be equal to the number of 
“renewals” which have occurred by the time ¢. 


10.1 Renewal Processes. Renewal Functions 279 


Remark 10.1.1 If the j-th renewal of the component does not happen immediately 
but requires a time T; > 0, then, introducing the random variables 


k 
tT; = F)-+ Tj, C= base n*(t):=min{k: Tf >t}, 

j=l 
we get an object of the same nature as before, with nearly the same physical mean- 
ing. For such an object, a number of additional results can be obtained, see e.g., 
Remark 10.3.1. 

Renewal processes are also quite often used in probabilistic research per se, and 
also when studying other processes for which there exist so-called “regeneration 
times” after which the evolution of the process starts anew. Below we will encounter 
examples of such use of renewal processes. 

Now we return to the general case where t; may assume both positive and nega- 
tive values. 


Definition 10.1.2 The function 
H(t):=En(t), t=0, 


is called the renewal function for the sequence {Tk} ¢. 
In the existing literature, another definition is used more frequently. 


Definition 10.1.2A The renewal function for the sequence {Tx}7~.o is defined by 


U(t):= )) PUT; <0). 


j=0 


The values of H(u) and T(u) can be infinite. 
If t; = 0 then the above definitions are equivalent. Indeed, for t > 0, consider 
the random variable 


v(t) = max{k : T, <t}=n(t)—-1. 
Then clearly 


Yd) <H)=14+ v0), 


j=0 
where I(A) is the indicator of the event A, and 


U(t)=1+4+ Ev(t) =En(t) = A(t). 


The value U(t) = Ev(t) + 1 is the mean time spent by the trajectory {T)}F20 in the 
interval [0, f]. 

If t; can take values of different signs then clearly v(t) > n(t) and, with a pos- 
itive probability, v(t) > n(t) (the trajectory {7;}, after crossing the level t, can re- 


turn to the region (—oo, t]). Therefore in that case U(t) > H(t). Thus for 1; taking 


280 10 Renewal Processes 


values of different signs we have two versions of the renewal function given in Def- 
initions 10.1.2 and 10.1.2A. We will call them the first and the second versions, 
respectively. In the present chapter we will consider the first version only (Defini- 
tion 10.1.2). The second version is discussed in Appendix 9. 

Note that, for t; assuming values of both signs and t < 0, we have H(t) = 0, 
U(t) > 0, so the function H(t) has a jump of magnitude | at the point t = 0. 

Note also that the functions H(t) and U(t) we defined above are right- 
continuous. In the existing literature, one often considers left-continuous versions 
of renewal functions defined respectively as 

[o,@) 
A(t —0)=Emin{k: 8, >t) and U(t—0)= > P(S; <t). 
j=0 
If all t; are identically distributed and F *k(t) is the k-fold convolution of the dis- 


tribution function F (t) = P(é; < t), then the second left-continuous version of the 
renewal function can also be represented in the form 


yO: 
k=0 


where F*° corresponds to the distribution degenerate at zero. 

From the point of view of the exposition below, it makes no difference which 
version of continuity is chosen. For several reasons, in the present chapter it will be 
more convenient for us to deal with right-continuous renewal functions. Everything 
below will equally apply to left-continuous renewal functions as well. 


10.1.2 The Integral Renewal Theorem for Non-identically 
Distributed Summands 


In the case where t;, j = 2, are not necessarily identically distributed and do not 
possess other homogeneity properties, singling out the random variable t; makes 
little sense. 


Theorem 10.1.1 Let t;, j = 1, be uniformly integrable from the right, E|Ty| < co 
for any fixed N and ay = Et, > a> 0 as k — ov. Then the following limit exists 


lim AY = = (10.1.3) 


too t a 


Proof We will need the following definition. 


Definition 10.1.3 The random variable 
xvH= Tha) —t>0 


is said to be the excess of the level t (or overshoot over the level t) for the random 
walk {Tj}. 


10.1 Renewal Processes. Renewal Functions 281 
Lemma 10.1.1 Jf a, € [a,, a*], a, > 0, then 

En(t 1 
ao) oS. (10.1.4) 


t 
En(t) > —, lim sup < 
a* Ax 


t>oo 


Proof By Theorem 4.4.2 (see also Example 4.4.3) 
ET, =t +Ex() <a*En(). 


This implies the first inequality in (10.1.4). Now introduce truncated random vari- 


ables 1°) := min(t;, 5). By virtue of the uniform integrability, one can choose an s 


such that, for a given ¢ € (0, a,), we would have 
Aj.s i= Er.” > dy — &. 
Then, by Theorem 4.4.2, 
t+s> ET) > (a, — €)En™, 


where 


n 
TS) = ae e, nt) := min{k : 7 > th. 
j=l 


Since n(t) < ©) (t), one has 


t 
H(t) =En(t) < En) < (10.1.5) 
dy — € 
As € > 0 can be chosen arbitrarily, we obtain that 
H(t 1 
lim sup HM) <—. 
too t Ay 


The lemma is proved. 


We return to the proof of Theorem 10.1.1. For a given ¢ > 0, find an N such 
that a, € [a — €,a +] for all k > N and denote by Hy(t) the renewal function 
corresponding to the sequence {ty +4}. Then 


t 


H(t) =E(n(0); Ty >t) +f P(Ty €du)[N + Hy(t —u)] 


=E[Hy(t — Ty); Ty <t]+rw, (10.1.6) 
where 
ry :=E(n(t); Ty >t) + NP(Tn St) < NP(Ty >t) + NP(Ty <t)=N. 
Relation (10.1.5) implies that there exist constants c;, cz, such that, for all r, 
Ay(t) <c1 + ct. 


Therefore, for fixed N and M, 


282 10 Renewal Processes 


Ry. :=E[Hy(t — Ty); |Tv| = M, Ty <1] 
< (cj + c2t)P(|Ty| >M,Twn < t) + c2E|Ty |. 
Choose an M such that coP(|Ty | > M) < ¢. Then 


rvn+tR 
N N.M — 


lim sup 
t>oo 


To bound H(t) in (10.1.6) it remains to consider, for the chosen N and M, the 
function 


(10.1.7) 


Hy, (t) :=E[Hwy(t — Tw); |Tn| < M]. 
By Lemma 10.1.1, 


H, t 1 
jimsup ———— 
t>0o t a-—€ 
A, t) PT, M 1 
timing PXMO , PUTwl <M). L+e/er 
too t ate ke 
This together with (10.1.6) and (10.1.7) yields 
H(t 1 Hit re 
img oe linaint (t) > ( é/e2) 
too t a—eé too t ate 


Since ¢ is arbitrary, the foregoing implies (10.1.3). 
The theorem is proved. 


Remark 10.1.2 One can obtain the following generalisation of Theorem 10.1.1, in 
which no restrictions on t; > 0 are imposed. Let t be an arbitrary nonnegative 
random variable, and T; := 14, Satisfy the conditions of Theorem 10.1.1. Then 
(10.1.3) still holds true. 

This assertion follows from the relations 


t 
H(t) =P(q >H+ f P(t, dv) H*(t — v), (10.1.8) 
0 


where H*(t) corresponds to the sequence {t;} and, for each fixed N and v < N, 


A*(t—v) A*(t—v) t-v 1 
t  ¢-»v  ¢ a 

as t + oo. Therefore 
, P<) 


a 


1 N 
-| P(t, €¢dv)H*(t — v) 
t Jo 

For the remaining part of the integral in (10.1.8), we have 


1 [ H*(t) _ Pm >N) 
— 


lim sup — P(t; € dv) H*(t — v) < limsup ; P(t > N) 
t>oo 


tooo t IN 


Since the probability P(t; > N) can be made arbitrarily small by the choice of N, 
the assertion is proved. 


10.1 Renewal Processes. Renewal Functions 283 


It is not difficult to verify that the condition t; > 0 can be relaxed to the condition 
Emin(0, t;) > —oo. However, if Emin(0, t,) = —oo, then H(t) = oo and relation 
(10.1.3) does not hold. 

Obtaining an analogue of Theorem 10.1.1 for the second version U(t) of the 
renewal function in the case of uniformly integrable t; taking values of both signs is 
accompanied by greater technical difficulties and additional conditions. For a fixed 
é > 0, split the series U(t) = ys P(T,, < t) into the three parts 


pe = Ss $ De a pS % oe = bs . 
n< (=) In—F |< n> tt) 
By the law of large numbers (see Corollary 8.3.2), 
Tn 


p 
= ras 
n 


Therefore, for n < iC) 


PUTy <= P(Tr = = Jo 


and hence 


1 l-e 
a 
t 1 a 
The second sum allows the trivial bound 
1 2¢ 
7X <4, 
t 2 a 
where the right-hand side can be made arbitrarily small by the choice of ¢. 


The main difficulties are related to estimating }°,. To illustrate the problems 


arising here we confine ourselves to the case of identically distributed 1; {r In 
this case the required estimate for )°; can only be obtained under the condition 
E(t~)? < oo, tT := max(0, —t). Assume without losing generality that Et? <oo. 
(If E(t+)? = 00, t+ := max(0,r), then introducing truncated random variables 
i = min(s, tj), we obtain, using obvious conventions concerning notations, that 
P(T, <t) < P(T;” <1), UG) < Ut) and ©, < 1%, where E(t“)? < 00 and 
the value of Et“) can be made arbitrarily close to a by the choice of s.) In the 
case Er* < oo we can use Theorem 9.5.1 by virtue of which, for a regularly vary- 
ing left tail W(t) = P(t < -t) = t-B L(t) (L(t) isa slowly varying function) and 
n> +£(1+ 6), we have 


PUT, <t)= PG —an <—(an— t)) ~nW(an-—t). 


By the properties of slowly varying functions (see Appendix 6), for the values 
u =n/t comparable to 1, n > ae! +) and t > oo, we have 


Want) | (“ - ‘y 
Wet) € : 


284 10 Renewal Processes 


Thus for 8 > 2, ast > ow, 


= > P(T, =a ee aust vW(av—t)dv 


cise 
2 ° lau 
~ t°W(et) u 
Ite 
a 


n> ~— 
Summarising, we have obtained that 


1 —B 
du ~ c(e)t? W(t) = o(1). 


. Ut) 1 
lim —— = — 


t>oo f a’ 


Now if E(t~)? = 00 then U(t) = 00 for all t. In this case, instead of U(t) one 
studies the “local” renewal function 


U(t,h) =) \P(T € (t,t + Al) 


n 


which is always finite provided that a > 0 and has all the properties of the increment 
H(t +h) — H(£) to be studied below (see e.g. [12]). 

In view of the foregoing and since the function H(t) will be of principal interest 
to us, in what follows we will restrict ourselves to studying the first version of the 
renewal function, as was noted above. We will mainly pay attention to the asymp- 
totic behaviour of the increments H(t + h) — H(t) as t > oo. To this is closely 
related a more general problem that often appears in applications: the problem on 
the asymptotic behaviour as t + oo of integrals (see e.g. Chap. 13) 


t 
i. g(t —y)dH(y) (10.1.9) 


for functions g(v) such that 


[sma < 00. 
0 


Theorems describing the asymptotic behaviour of (10.1.9) will be called the key 
renewal theorems. The next sections and Appendix 9 will be devoted to these theo- 
rems. Due to the technical complexity of the mentioned problems, we will confine 
ourselves to considering only the case where t;, j = 2, are identically distributed. 

Note that in some special cases the above problems can be solved in a very simple 
way, since the renewal function H(t) can be found there explicitly. To do this, as it 
follows from Wald’s identity used above, it suffices to find Ex (t) in explicit form. 
If, for instance, t; are integer-valued, P(t; = 1) > 0 and P(t; = 2) = 0, for all 
j = 1, then x(t) = 1 and Wald’s identity yields H(t) = (t + 1)/a. Similar equalities 
will hold if P(t; >t) = ce-Y' for t > O and y > 0 (if Tj are integer-valued, then t 
takes only integer values in this formula). In that case the distribution of x (t) will 
be exponential and will not depend on ¢ (for more details, see the exposition below 
and also Chap. 15). 


10.2 The Key Renewal Theorem in the Arithmetic Case 285 
10.2 The Key Renewal Theorem in the Arithmetic Case 


We will distinguish between two distribution types for t;: arithmetic in an extended 
sense (when the lattice span is not necessary 1; for the definition of arithmetic distri- 
butions see Sect. 7.1) and all other distributions that we will call non-arithmetic. It is 
clear that, say, a random variable taking values 1 and /2 with positive probabilities 
cannot be arithmetic. 

In the present section, we will consider the arithmetic case. Without loss of gener- 
ality, we will assume that the lattice span is 1. Then the functions P(t; < t) and H(t) 
will be completely determined by their values at integer points t =k,k =0,1,2.... 

First we consider the case where the t; are positive, T; {ft for j = 2. In that 
case, the difference 

[o,@) 
h(k) := H(k) — H(k-D= PG; =k), k>1, 
j=0 
is equal to the expectation of the number of visits of the point k by the walk {T;}. 
Put 


gk=P(r =k, — pe= P(r =k). 


Definition 10.2.1 A renewal process 7(t) will be called homogeneous and denoted 
by no(t) if 


1 [o,@) [o,@) 
dk = = DL Pir k=1,2,...,. a=Et, (si > a= ) (10.2.1) 
k k=1 
If we denote by p(z) the generating function 
[o,@) 
pz) =Ez’ = )_ pez’, 
k=1 
then the generating function g(z) = Ez’! = yO qxz* will be equal to 
oo J 


pe 0° Z =4 gee 1-2 z(1 — p(z)) 
a! k 2 k pees eer aa 
Oe 2?) a XPD? ro ever a(1—2z) 


As we will see below, the term “homogeneous” for the process no(t) is quite justi- 
fied. One of the reasons for its use is the following exact (non-asymptotic) equality. 


Theorem 10.2.1 For a homogeneous renewal process no(t), one has 


k 
Ho(k) := Eno(k) = 14 —. 


Proof Consider the generating function r(z) for the sequence ho(k) = Ho(k) — 
Ao(k — 1): 


286 10 Renewal Processes 


r(z= > zkho(k) = ¥\ >> PT =) 
1 
{5 E16 > poz q(z) = va ; 
aA “20 1—p@) a(1—z) 


This implies that ho(k) = 1/a. Since Ho(0) = 1, one has Ho(k) = 1+ k/a. The 
theorem is proved. 


Sometimes the process 7o(t) is also called stationary. As we will see below, 
it would be more appropriate to call it a process with stationary increments (see 
Sect. 22.1). 

The asymptotic regular behaviour of the function h(k) as k — oo persists in the 
case of arbitrary tT, as well. 

Denote by d the greatest common divisor (g.c.d.) of the possible values of tT: 


d:= g.c.d.{k : px > O}, 


and let g(k), k =0, 1,..., be an arbitrary sequence such that 
[o,@) 
Y“|g(k)| < oo. 
k=0 


Theorem 10.2.2 (The key renewal theorem) /f d = 1, t, is an arbitrary integer- 


valued random variable and T; Ma t >O/for j = 2, then,ask > o, 


1 ee 
h(k) := H(k) - H{k-l)> a 2_bOg&— hs - 2 g(m). 


m=0 


These two relations are equivalent. 
The first assertion of the theorem is also called the Jocal renewal theorem. 
To prove the theorem we will need two auxiliary assertions. 


Lemma 10.2.1 Let all t; be identically distributed and v = 1 be a Markov 
time with respect to the collection of o-algebras {F,}, where Fn is independent 
Of O(T+1, T+2,---). Then the o-algebra generated by the random variables v, 
T1,---,Ty, and the o-algebra o{ty41, Ty+2,-..} are independent. The sequence 
{Ty41, Tv42,---} has the same distribution as {T,, T2,...}. 


Thus, in spite of their random numbers, the elements of the sequence 1,4; are 
distributed as 7;. 


Proof For given Borel sets By, Bo,..., C1, C2,... put 


A:={vEN, | € Bi,...,T € By}, Dy = {41 €C1,..., Tak € Cr}, 


10.2 The Key Renewal Theorem in the Arithmetic Case 287 


where N is a given set of integers and k is arbitrary. Since P(D;) = P(Do) and the 
events Dj and {v = j} are independent, the total probability formula yields 


[o,@) [o,@) 
P(D,) = )\ PQ = j, Dj) = Y Pv = f)P(Dj) = P(Do). 
j=! j=l 
Therefore, by Theorem 3.4.3, in order to prove the required independence of the 
o-algebras, it suffices to show that P(D, A) = P(Do)P(A). 
By the total probability formula, 


P(D,A) = SY > P(D Atv =j)= Yo P(Dj Atv = j}). 
JEN jen 


But the event A{v = j} belongs to F;, whereas Dj € o(tj+1,.-., T)+x). Therefore 
Dj; and A{v = j} are independent events and 


P(Dj A{v = j}) = P(D,)P(Alv = j}) =P(Do)P(Atv=j}), f= 1. 
From here it clearly follows that P(D, A) = P(Do)P(A). The lemma is proved. 


Lemma 10.2.2 Let ¢),¢2,... be independent arithmetic identically and symmet- 
rically distributed random variables with zero expectation E¢; = 0. Put Zy := 
pa ¢;. Then, for any integer k, 


Vg t= min{n: Z, = k} 


is a proper random variable: P(vy, < oo) = 1. 
The proof of the lemma is given in Sect. 13.3 (see Corollary 13.3.1). 


Proof of Theorem 10.2.2 Consider two independent sequences of random vari- 
ables (we assume that they are given on a common probability space): a sequence 
T,T2,---, where T; has an arbitrary distribution, and a sequence Ty ; T, ..., Where 
P(t’; =k) = qx (see (10.2.1)), and P(t’; =k) = P(t; =k) = px for j > 2 (so that 
k 
j 


Tj £ t; for j > 2; the process n’(t) constructed from the sums Tj = >> 
homogeneous (see Definition 10.2.1)). 

Set v:= min{n > 1: T, = T,}. It is clearly a Markov time with respect to the 
sequence {T;, Ti}. We show that P(v < oo) = 1. Put 


KS 
—1T; is 


Zn = (=) forn>2, Z;:=0, Zo:=% — Ty. 


Then 
v=min{n >1:Z,=-—Zo}. 


By Lemma 10.2.2 (¢j = tj — Tj have a symmetric distribution for j > 2), for each 
integer k the variable vz, = min{n > 1: Z, = k} is proper. Since Z, for n > | and 
Z', are independent, we have 


P(v < 00) =) P(Zo = —k)P(R < 00) = 1. 
k 


288 10 Renewal Processes 


Now we will “glue together” (“couple”) the sequences {T,} and {T/}. Since 
T, = T; and v is a Markov time, by Lemma 10.2.1 one can replace t,+1, T)42,.-- 
with Eis ee ... (and thereby replace 7,41, T)+2 with Ets Ty a3 ...) without 
changing the distribution of the sequence {7}. 

Therefore, on the set {J < k} one has n(t) = n/(t) for t > k — 1 and hence 


h(k) = E(n(k) — n{k — 1)) 
=E[n'(k) — n/(k-1); T, < k) + E[n) — nk -1); T, =k] 


1 
= — ~E[n'(k) =k); T > k] +E[n&) — nk - 1); Ty 2 ki], 
Since |n(k) — n(k — 1)| < 1, we have 


1 
h(k) —-|<P(T], =k) > 0 
a 


as k — oo. The first assertion of Theorem 10.2.2 is proved. 
Since h(k) < 1, we can make the value of 


k—-N k-1 lee) 
YAOsk-Di< Yo |sOls YS [sO] 
l=1 I=N+1 I=N+1 


arbitrarily small by choosing an appropriate NV. Moreover, by virtue of the first as- 
sertion, for any fixed NV, 


k N 


1 
> hDgk-D > —Y ia) ask > 00. 


l=k—N 1=0 


This implies the second assertion of the theorem. 


Remark 10.2.1 The coupling of {T;,} and {7,}} in the proof of Theorem 10.2.2 could 
be done earlier, at the time y := min{n > 1: T, € T’}, where 7’ is the set of points 
T’={T,, Piticks 


Theorem 10.2.3 The assertion of Theorem 10.2.2 remains true for arbitrary (as- 
suming values of different signs) T;. 


Proof We will reduce the problem to the case t; > 0. First let all t; be identi- 
cally distributed. Consider the random variable x; = x (0) that we will call the first 
positive sum. We will show in Chap. 12 (see Corollary 12.2.3) that Ex; < oo if 
a=Et; < ov. According to Lemma 10.2.1, the sequence T,(9)+1, T(@)+2,--- Will 
have the same distribution as tT), T2,.... Therefore the “second positive sum” x2 or, 
which is the same, the first positive sum of the variables t,(9)+1, T(0)+2,--. Will 
have the same distribution as x; and will be independent of it. The same will be true 
for the subsequent “overshoots” over the already achieved levels x1, x1; + X2,.... 
Now consider the random walk 


aoa 


ee) 


k=1 


10.2 The Key Renewal Theorem in the Arithmetic Case 289 


and put 
n*(t) := min{k : Hy > t}, x*@) := Aye —t, AX (t):=En*(t). 


Since xz > 0, Theorem 10.2.2 is applicable, and therefore by Wald’s identity 


1 
H*(k) — H*(k N= a + Ex "& Ex*(k—D) bees 


Ex*(k) —Ex*(k — 1) > 0. 


Note now that the distributions of the random variables x (t) (see Definition 10.1.3) 
and x*(t) coincide. Therefore 


H(k) — Hk -1)= -(1+ Ex) —Ex(k- 1) 


Od ee Ol 


(1+ Ex*(k) —Ex*(k— 1) = 


Now let the distributions of t; and tj, j > 2, be different. Then the renewal 
function H,(t) for such a walk will be equal to 


k k 

Ay(k) =1+ » P(t, =i)[H(k-i) +1] =1+ > P(t, =i)H(k —i), 
= = 

hy(k)= Ak) - M(kK-)= ~~ PQ =H)h(K—-i), k=O, 


where H\(—1) =0, h(O) = H(0) and the function H(t) corresponds to identically 
distributed t;. If we had h(k) < c < oo for all k, that would imply convergence 
h\(k) — 1/a and thus complete the proof of the theorem. 

The required inequality h(k) < c actually follows from the following gen- 
eral proposition which is true for arbitrary (not necessarily lattice) random vari- 
ables T;. 


Lemma 10.2.3 Jf all t; are identically distributed then, for all t and u, 


A(t+u)— A(t) < Atv) <c,4+ cou. 


Proof The difference n(t + u) — n(t) is the number of jumps of the trajectory {Tx} 
that started at the point f + x(t) > ¢ until the first passage of the level t + u, where 
the sequence {Tx} has the same distribution as {7;} and is independent of it (see 
Lemma 10.2.1). In other words, y(t + uv) — n(u) has the same distribution as 7(t — 
x(t)) < 7(t), where 7 corresponds to {Ty} if x(t) <u and to n(t +u) — n(t) =0 
if x(t) > u. Therefore H(t + u) — H(t) < H(u). The inequality for H(u) follows 
from Theorem 10.2.1. The lemma is proved. 


Theorem 10.2.3 is proved. 


290 10 Renewal Processes 


10.3 The Excess and Defect of a Random Walk. Their Limiting 
Distribution in the Arithmetic Case 


Along with the excess x(t) = Ty~@) — t we introduce one more random variable 
closely related to x (f). 


Definition 10.3.1 The random variable 
y(t) :=t —Tyw-1 =t — Toa 
is called the defect (or undershoot) of the level t in the walk {T,,}. 


The quantity x(t) may be thought of as the time during which the component 
that was working at time f will continue working after that time, while y(t) is the 
time for which the component has already been working by that time. 

One should not think that the sum x(t) + y(t) has the same distribution as t;— 
this sum is actually equal to the value of a t with the random subscript y(t). In 
particular, as we will see below, it may turn out that Ex (t) > Et; for large rt. The 
following apparent paradox is related to this fact. A passenger coming to a bus stop 
at which buses arrive with inter-arrival times t; > 0, t2 > 0,... (Et; =), will wait 
for the arrival of the next bus for a random time x of which the mean Ey could 
prove to be greater than a. 

One of the principal facts of renewal theory is the assertion that, under broad 
assumptions, the joint distribution of x(t) and y(t) has a limit as tf > oo, so that 
for large t the distribution of x(t) does not depend on t any more and becomes 
stationary. Denote this limiting distribution of x (t) by G and its distribution function 
by G: 


G(x) = lim P(x() <x). (10.3.1) 


If we take the distribution of t; to be G then, for such process, by its very construc- 
tion the distribution of the variable x(t) will be independent of t. Indeed, in that 
case we can think of the positive elements of {7;} as the renewal times for a process 
which is constructed from the sequence {t;} and of which the start is shifted to a 
point —N, where N is very large. Since by virtue of (10.3.1) we can assume that 
the distributions of x (NV) and x (N +f) coincide with each other, the distribution of 
the variable x(t) (which can be identified with x(N + f)) is independent of t and 
coincides with that of t;. A formal proof of this fact is omitted, since it will not be 
used in what follows. However, the reader could carry it out using the explicit form 
of G(x) from (10.3.1) to be derived below. 

In the arithmetic case, the distribution G is just the law (10.2.1) used to construct 
the homogeneous renewal process no(t). We will prove this in our next theorem. 

It follows from the fact that, for the process no(t), the distribution of x(t) does 
not depend on ¢ and coincides with that of t,, that the distribution of no(t + u) — 
no(t) coincides with that of no(u) and hence is also independent of f. It is this 
property that establishes the stationarity of the increments of the renewal process; 
we called this property homogeneity. It means that the distribution of the number of 


10.3. Excess and Defect. Their Limiting Distribution in the Arithmetic Case 291 


renewals over a time interval of length u does not depend on when we start counting, 
and therefore depends on u only. 

Theorems on the limiting distribution of x(t) and y(f) are of interest not only 
from the point of view of their applications. We will need them for a variety of other 
problems. Again we consider first the case when the variables t; > 0 are arithmetic. 
In that case the “time” can also be assumed discrete and we will denote it, as before, 


by the letters n and k. Let, as before, 1; z t for j > 2 and py = P(t =k). 


Theorem 10.3.1 Let the random variable t > 0 be arithmetic, Et = a exist, tT, be 
an arbitrary integer random variable, and the g.c.d. of the possible values of t be 
equal to 1. Then the following limit exists 


jim n P(y() =i, x(k) =j)= Ped #0, FS. (10.3.2) 
a 


It follows from Theorem 10.3.1 that 


jim P(x x(k) =i) = 7p i>0; 
(10.3.3) 
jim n P(y(k) =i) = Ss) pj, j2o 


Ope =t--l 


Proof of Theorem 10.3.1 By the renewal theorem (see Theorem 10.2.2), for k > i, 


[e.e) 


P(y(k) =i, x) =s/) =) PM =k i, 41 =i +f) 
1 


~~ 


Pi+j 


we 


P(T; =k —i)P(t =i + j) =h({k— i) pit > 


~ 


1 


as k — oo. The theorem is proved. 


If Er? = m2 < ov, then Theorem 10.3.1 allows a refinement of Theorem 10.2.2 
(see Theorem 10.3.2 below). 


Corollary 10.3.1 If mz < 00, then the random variables x(k) are uniformly inte- 
grable and 


te a m2z+a 
Ey(k) > — j — k> ow. 10.3.4 
x(k) a ui DPI aS (10.3.4) 


Proof The uniform integrability follows from the inequalities h(k) < 1, 


(x(k) = j) = Sh <p. 


This implies (10.3.4) (see Sect. 6.1). 


292 10 Renewal Processes 


Now we can state a refined version of the integral theorem that implies Theo- 
rem 10.2.2. 


Theorem 10.3.2 [fall t; are identically distributed and Et? =m) <0, then 


ask—+> o. 


The Proof immediately follows from the Wald identity 


kK+Ey(k 
Hk) =En() = ew 


and Corollary 10.3.1. 


Remark 10.3.1 For the process n*(t) corresponding to nonzero times T; required 
for components’ renewals (mentioned in Remark 10.1.1), the reader can easily find, 
similarly to Theorem 10.3.1, not only the asymptotic value p;+j;/a* of the proba- 
bility that at time k — oo the current component has already worked for time i and 
will still work for time j, but also the asymptotics of the probability that the com- 
ponent has been “under repair” for time i and will stay in that state for time j, that 
is given by Pi, j/@*, where p; = P(t; =i), a* = E(tj + 7) = Er}. 


Now consider the question of under what circumstances the distribution of the 
random variable t, for the homogeneous process (i.e. the distribution of what one 
could denote by x (00)) will coincide with that of t; for j = 2. Such a coincidence 
is equivalent to the equality 


1 CO 
Pi = a > Pj 
j=i 


fori =1,2,..., or, which is the same, to 


ae | i /#ea\' 
a(pi — Pi-1)=—Pi-1, Pi=——Pi-1, Pi= : 
a a-1l a 


This means that the renewal process generated by the sequence of independent iden- 
tically distributed random variables tT, T2, ... is homogeneous if and only if Tj (or, 
more precisely, t;_;) have the geometric distribution. 

Denote by y and x the random variables having distribution (10.3.2). Using 
(10.3.1), it is not hard to show that y and x are independent also only in the case 
when t;, j => 2, have the geometric distribution. When all t;, j => 1, have such a 


distribution, y (”) and x () are also independent, and x (n) 2 T,. These facts can be 
proved in exactly the same way as for the exponential distribution (see Sect. 10.4). 

We now return to the general case and recall that if Et” < oo then (see Corol- 
lary 10.3.1) 


Er? 1 
Ex = 


Dg 2 


10.4 The Renewal Theorem in the Non-arithmetic Case 293 


This means, in particular, that if the distribution of t is such that Er? > 2a? — a, 
then, for large n, the excess mean value Ex (7) will become greater than Et =a. 


10.4 The Renewal Theorem and the Limiting Behaviour 
of the Excess and Defect in the Non-arithmetic Case 


Recall that in this chapter by the non-arithmetic case we mean that there exists no 
h > 0 such that P(),{t = kh}) = 1, where k runs over all integers. To state the 
key renewal theorem in that case, we will need the notion of a directly integrable 
function. 


Definition 10.4.1 A function g(u) defined on [0, 00) is said to be directly integrable 
if: 


(1) the function g is Riemann integrable! over any finite interval [0, N]; and 
(2) 3°, g(k) < 00, where gy = maxz<y<k41 |g(u)|- 


It is evident that any monotonically decreasing function g(t) | 0 having a finite 


Lebesgue integral 
[o,@) 
i; g(t)dt<« 
0 


is directly integrable. This also holds for differences of such functions. 

The notion of directly integrable functions introduced in [12] differs somewhat 
from the one just defined, although it essentially coincides with it. It will be more 
convenient for us to use Definition 10.4.1, since it allows us to simplify to some 
extent the exposition and to avoid auxiliary arguments (see Appendix 9). 


Theorem 10.4.1 (The key renewal theorem) Let 1; £y >0O for j >2 and g bea 
directly integrable function. If the random variable t is non-arithmetic, there exists 
Et =a > 0, and the distribution of tT is arbitrary, then, as t > oo, 


t oo 
/ g(t —u)dH(u) > al g(u)du. (10.4.1) 
0 a Jo 


There is a measure H on [0, 00) associated with H that is defined by H((x, y]) := 
H(y) — H(x). The integral 


t 
/ g(t —u)dH(u) 
0 


'That is, the sums n~! Yg, and no! >", have the same limits as n — oo, where g, = 
Minye A, (UU), B, = maxyeA, 8(U), Ax = [kA, (k + 1)A), and A = N/n. The usual definition 
of Riemann integrability over [0, 00) assumes that condition (1) of Definition 10.4.1 is satisfied 
and the limit of i g(u)du as N -> oo exists. This approach covers a wider class of functions 
than in Definition 10.4.1, allowing, for example, the existence of a sequence ft; —> oo such that 
B(tkh) > @. 


294 10 Renewal Processes 
in (10.4.1) can also be written as 


t 
‘is g(t —u) H(du). 
0 
It follows from (10.4.1), in particular, that, for any fixed u, 
HO= FSS (10.4.2) 
a 


It is not hard to see that this relation, which is called the local renewal theorem, is 
equivalent to (10.4.1). 

The proof of Theorem 10.4.1 is technically rather difficult, so we have placed 
it in Appendix 9. One can also find there refinements of Theorem 10.4.1 and its 
analogue in the case where T has a density. 

The other assertions of Sects. 10.2 and 10.3 can also be extended to the non- 
arithmetic case without any difficulties. Let all t; be nonnegative. 


Definition 10.4.2 In the non-arithmetic case, a renewal process n(t) is called ho- 
mogeneous (and is denoted by 7no(t)) if the distribution of the first jump has the 
form 


1 CO 
P(qy >x)= al P(t > t)dt. 
x 


The ch.f. of t; equals 
; 1 OO” 
7, (A) := Ee" = - | eX P(t > x) dx. 
a JO 


Since here we are integrating over x > 0, the integral exists (as well as the func- 
tion g(A) = gr (A) = Ee!**) for all A with Ima > 0 (for 4 =ia+v, —0o <u <0, 
a > 0, the factor e!** is equal to e~%*e!”* ; see property 6 of ch.f.s). Therefore, for 


Imad > 0, 
Gr, (A) = -5-[1 + [re el dP(t > »|= en. 
Theorem 10.4.2 For a homogeneous renewal process, 
H(t) =Eno()=1+, 120. 


Proof This theorem can be proved in the same way as Theorem 10.2.1. Consider 
the Fourier-Stieltjes transform of the function Ho(t): 


r(a):= [ e'** dHo(x). 
0 


Note that this transform exists for Ima > 0 and the uniqueness theorem established 
for ch.f.s remains true for it, since g*(v) := r(ia + v)/r(ia), —co < v < co (we put 
4. =ia+v fora fixed a > 0) can be considered as the ch.f. of a certain distribution 
being the “Cramér transform” (see Chap. 9) of Ho(t). 


10.4 The Renewal Theorem in the Non-arithmetic Case 295 


Since t; = 0, one has 


Ho(x) = YUP) <x). 


j=0 
As Ho(t) has a unit jump at t = 0, we obtain 


g(a)-1 1 
ida 1—Q(A) 


ray= f e'* dHo(x) =14+ 9° gr, (Ag! (A) = 1+ 
0 j=0 
1 


=l-_——. 


It is evident that this transform corresponds to the function Ho(t) = 1+ t/a. The 
theorem is proved. 


In the non-arithmetic case, one has the same connections between the homoge- 
neous renewal process no(t) and the limiting distribution of x(t) and y(t) as we 
had in the arithmetic case. In the same way as in Sect. 10.3, we can derive from the 
renewal theorem the following. 


Theorem 10.4.3 If t > 0 is non-arithmetic, Et = a, and the distribution of t > 0 
is arbitrary, then the following limit exists 
CO 


Jim P(y (0) >u, x(t) > v) = -| P(t > x)dx. (10.4.3) 


ut 
Proof Fort > u, by the total probability formula, 


P(y(t) >u,x(t)> v) 


oo t—u 
=P(r >rtv+y | P(n(t) =f +1, 7; €dx, y(t) >u, x(t) > v) 
. 0 
j=1 
oo t—u 
=P ar+n+ > f P(T; € dx, tj41>t—-x+0) 
. 0 
j=l 


t—u 
=P(r arto) Peart) f dH(x)P(t>t—x+v). (10.4.4) 
0 


Here the first two summands on the right-hand side converge to 0 as t > oo. By 
the renewal theorem for g(x) = P(t > x +u + v) (see (10.4.1)), the last integral 
converges to 


1 foe) 
-/ P(t >x+ut+ov)dx. 
0 


a 


The theorem is proved. 


As was the case in the previous section (see Theorem 10.3.2), in the case 
Et? =m <0 Theorem 10.4.3 allows us to refine the key renewal theorem. 


296 10 Renewal Processes 


Theorem 10.4.4 [f all t; £, > 0 are identically distributed and Et” = m2 < 00, 
then, as t > oo, 

b m 
tm 


H(t)= 
© a 2a? 


+ o(1). 


Proof From (10.4.4) for u = 0 and Lemma 10.2.3 it follows that x (t) are uniformly 
integrable, for 


t 
P(x(t) > =f dH (x) P(t >t—x+v) <(c1 +2) ¥ > P(r >k+v), 


k>0 
(10.4.5) 
and therefore by (4.4.3) 
1 CO foo m 
Ex(t) > — P(t > u)dudv= —. (10.4.6) 
aso Jy 2a 


It remains to make use of Wald’s identity. The theorem is proved. 


One can add to relation (10.4.6) that, under the conditions of Theorem 10.4.4, 
one has 


Ex7(t) = o(t) (10.4.7) 


as t —> oo. Indeed, (10.4.5) and Lemma 10.2.3 imply 


t 
P(x(t) >v) < (c1 +e2) )) Pt >k+v) <e | P(t >z+0)dz. 
k<t 9 


Further, integrating by parts, we obtain 
[o,@) 
Exo =- f v° dP(x(t) > v) 
0 


[o.@) t CO 
= uP(x(t) > v)dv <2c f 7 vP(t > z+0)dvdz, 
0 0 40 
(10.4.8) 
where the inner integral converges to zero as z > oo: 
v(t > e+ udv=; vedP(t <z+v) <5E(t;t>z) > 0. 
0 0 


This and (10.4.8) imply (10.4.7). 
Note also that if only Et exists, then, by Theorem 10.1.1, we have Ex (t) = o(t) 
and, by Theorem 10.4.1 (or 10.4.3), 


P(x(t) > v) > +f Pc >u+v)du. 
a Jo 


10.4 The Renewal Theorem in the Non-arithmetic Case 297 


Now let, as before, y and x denote random variables distributed according to the 
limiting distribution (10.4.3). Similarly to the above, it is not hard to establish that 
if Er* < 00, k > 1, then, as t > 00, 


Ex*!@) > Ex*! <«@, Ex*(t) =o(t). 


Further, it is seen from Theorem 10.4.3 that each of the random variables y and 
x has density equal to a~!P(t > x). The joint distribution of y and y may have no 
density. If t has density f(x) then there exists a joint density of y and x equal to 
a! f(x + y). It also follows from Theorem 10.4.3 that y and x are independent if 
and only if 


se 1 
i: P(t > u)du=—e ** 
ie a 
for some a > 0, i.e. independence takes place only for the exponential distribution 
TETy. 

Moreover, for homogeneous renewal processes the coincidence of P(t, > x) 
and P(t > x) takes place only when t @ Ty. In other words, the renewal pro- 
cess generated by a sequence of identically distributed random variables 1), T,... 
will be homogeneous if and only if t; € Ty. In that case no(t) is called (see also 
Sect. 19.4) a Poisson process. This is because for such a process, for each f, the 
variable y(t) = no(t) has the Poisson distribution with parameter t/a. 

The Poisson process has some other remarkable properties as well (see also 
Sect. 19.4). Clearly, one has x (t) € I'g for such a process, and moreover, the vari- 
ables y(t) and x(t) are independent. Indeed, by (10.4.4), taking into account that 
H (x) has a jump of magnitude | at the point x = 0, we obtain for u < f that 


t—u 
P(y(t) >u,x()>v)= ge) af e HE-xtY) gy 
0 


=¢ °) — P(y@)=n)Py@® =); 
P(y(t) =t, x() > v) = P(t > t $9) =e = P(y(t) =1)P(x (1) > v); 
P(y() >t) =0. 


These relations also imply that the random variable ty) = y(t) + x(t) has the 
same distribution as min(¢, t}) + tT, where t; €Ty, j = 1,2, are independent so 
that T) & TPe,2 as t > ov. 

The fact that y(t) and x(t) are independent of each other deserves attention 
from the point of view of its interpretation. It means the following. The residual 
lifetime of the component operating at a given time ¢ has the same distribution as 
the lifetime of a new component (recall that t; € P',) and is independent of how long 
this component has already been working (which at first glance is a paradox). Since 
the lifetime distributions of devices consisting of large numbers of reliable elements 
are close to the exponential law (see Theorem 20.3.2), the above-mentioned fact is 
of significant practical interest. 

If t; can assume negative values as well, the problems related to the distributions 
of y(t) and x(t) become much more complicated. To some extent such problems 


298 10 Renewal Processes 


can be reduced to the case of nonnegative variables, since the distribution of x (t) 
coincides with that of the variable x*(t) constructed from a sequence {t* > O}, 
where t* have the same distribution as x (0). The distribution of x (0) can be found 
using the methods of Chap. 12. 

In particular, for random variables 1, t2,... taking values of both signs, Theo- 
rems 10.4.1 and 10.4.3 imply the following assertion. 


Corollary 10.4.1 Let 1, t2,... be non-arithmetic independent and identically dis- 
tributed and Et, = a. Then the following limit exists 


: 1 oe 
Jim P(x) > ¥) = eo | P(x (0) >t) dt, v>0. 
For arithmetic Tj, 


lim P(x (k) =i) = : P(x (0 | i>0 
Perea (xO =i) =F oy (x (0) >i), L>v. 


10.5 The Law of Large Numbers and the Central Limit 
Theorem for Renewal Processes 


In this section we return to the general case where 1; are not necessarily identically 
distributed (cf. Sect. 10.1). 


10.5.1 The Law of Large Numbers 
First assume that t; > 0 and put 


n 
ak = Et,, An = Yo a. 
k=1 


Theorem 10.5.1 Let t, > 0 be independent, t, — ag uniformly integrable, and 
n-!A, > a>0Oasn-— oo. Then, as t > 00, 


n(t) p 1 
—->-. 
t a 
Proof The basic relation we shall use is the equality 
{n(t) >n}={T <1}, (10.5.1) 


which implies 


(me I ) ( t ) 
P| — —->e)=P{ n(t) > -CGe))=P(h <5), 
t a a 


10.5  LLN and CLT for Renewal Processes 299 


where for simplicity we assume that n = F(1 + €) is an integer. Further, 


Th a 
PT, <t)=P < 
n 


a Tn ~ An _ a An <p Tn ~An _ _aé 
~ n ~ Il+e nj} n ~ 2 


for n large enough and ¢ small enough. Applying the law of large numbers to the 
right-hand side of this relation (Theorem 8.3.3), we obtain that, for any ¢ > 0, as 


t>o, 
t 1 
(2 >=) > 0. 
t a a 


The probability pio - i < —£) can be bounded in the same way. The theorem 
is proved. 


10.5.2 The Central Limit Theorem 


Put 


n 
op := E(t, — ax)? = Var te, B= yo 
k=1 


Theorem 10.5.2 Let t% > 0 and the random variables t, — ax satisfy the Lindeberg 
condition: for any 5 > 0 andn > o, 
n 
SY E(\t —agl?; |te — a¢| > 5Bn) = 0(Bz). 
k=1 


Let, moreover, there exist a > 0 and o > 0 such that, as n > oo, 


n 
An =) a =an+o(Jn), B? =o’n+o(n). (10.5.2) 
k=1 
Then 
fy—1 
2 (10.5.3) 
o/t/a3 


Proof From (10.5.1) we have 


Th = An L>= An 
P(n(t) >n) =P(T, <t) = P/ a i: (10.5.4) 


Let n vary as t > oo so that 
t— An 
—————-—_ 
By 


300 10 Renewal Processes 


for a fixed v. To find such an n, solve for n the equation 
t—an 
o./n 7 

This is a quadratic equation in n, and its solution has the form 


t vo 1 
n=—+ vat + o(=)). (10.5.5) 


For such n, by (10.5.2), 


t—A, [_vo (Pol) _ 
ee era 


This equality means that we have to choose the minus sign in (10.5.5). Therefore, 
by (10.5.4) and the central limit theorem, 


n(t) —t/a 
ot/a3 


P(n(t) >n)=P( > -v+0(0)) => @(v) =1—(-v). 


Changing —v to u, by the continuity theorems (see Lemma 6.2.2) we get 


P( 2 =e “) > Blu). 


ovta—3 


The theorem is proved. 


Remark 10.5.1 In Theorems 10.5.1 and 10.5.2 we considered the case where A, 
grows asymptotically linearly as n — oo. Then the centring parameter t/a for 7(t) 
changes asymptotically linearly as well. However, nothing prevents us from consid- 
ering a more general case where, say, A, ~ cn, a > 0. Then the centring parameter 
for n(t) will be the solution to the equation cn® = t, i.e. the function (t/ c)/ © (under 
the conditions of Theorem 10.5.2, in this case we have to assume that B, = o(A,)). 
The asymptotics of the renewal function will have the same form. 

In order to extend the assertions of Theorems 10.5.1 and 10.5.2 to t; assuming 
values of both signs, we need some auxiliary assertions that are also of independent 
interest. 


10.5.3 A Theorem on the Finiteness of the Infimum of the 
Cumulative Sums 


In this subsection we will consider identically distributed independent random vari- 
ables tT), T2,.... We first state the following simple assertion in the form of a lemma. 


Lemma 10.5.1 One has E|\t| < © if and only if 


CO 
Y > P(r! > ji) <oOo. 
i=) 


10.5 LLN and CLT for Renewal Processes 301 


The Proof follows in an obvious way from the equality 


foe) 
El= [ P(|z| > x) dx 
0 


and the inequalities 


y P(e > ji) < [> P(e >x)dx< 1+ P(rI _ ). 


j=l 


Let, as before, 


n 
Th = > Tj. 
j=l 


Theorem 10.5.3 [f T; {tare identically distributed and independent and Et > 0, 
then the random variable Z := infy>o T, is proper (finite with probability 1). 


Proof Let n; = (1) be the number of the first sum 7; to exceed level 1. Consider 
the sequence {t; = T,+k} that, by Lemma 10.2.1, has the same distribution as {tx} 
and is independent of 7, tT, ..., T,. For this sequence, denote by 72 the subscript 
k for which the sum T° = yy t* first exceeds level 1. It is clear that the random 
variables 7; and 72 are identically distributed and independent. Next, construct for 
the sequence {t/* = Tp)4)+k} the random variable n3 following the same rule, 
and so on. As a result we will obtain a sequence of Markov times 71, 2,... that 
determine the times of “renewals” of the original sequence {7;,}, associated with 
attaining level 1. 
Now set 
Z1 := min Tx, Z2:= min T/,... 
k<n k<n 
Clearly, the Z; are identically distributed and 
Z = inf{Z 1, Ty, + Z2, Ty +n. + Z3,---}, 


where by definition T,, > 1, Ty; 4+. > 2 and so on. Hence 


[o.@) CO 
{Z<—-N}= LtZer1 $+ Ty pouty <—N}C tz Hee oN I, 
k=0 k=0 
[oe [o.@) 
P(Z <-N) <> P(Qet+k<-N)= > P(Z <j). 
k=1 j=N+1 


This expression tends to 0 as N + o provided that E| Z;| < oo (see Lemma 10.5.1). 
It remains to verify the finiteness of EZ,, which follows from the finiteness of 
En; = En(1) = H(1) <c (see Example 4.4.5) and the relations 
mn 
E|Zi|<E) | |tj| =EmE|t1| < 00 
j=l 


(see Theorem 4.4.2). 


302 10 Renewal Processes 


10.5.4 Stochastic Inequalities. The Law of Large Numbers and the 
Central Limit Theorem for the Maximum of Sums of 
Non-identically Distributed Random Variables Taking 
Values of Both Signs 


In this subsection we extend the assertions of some theorems of Chap. 8 to maxima 
of sums of random variables with a positive “mean drift”. To do this we will have to 
introduce some additions restrictions that are always satisfied when the summands 
are identically distributed. Here we will need the notion of stochastic inequalities 
(or inequalities in distribution). Let € and ¢ be given random variables. 


Definition 10.5.1 We will say that ¢ majorises (minorises) & in distributionand de- 


note this by € £ cE : ¢) if, for all r, 
PES1H<PC>t) = (PESH>P(C=DN). 


d d 
Clearly, if € < ¢ then —& > —¢. We show that stochastic inequalities possess 
some other properties of ordinary inequalities. 
Lemma 10.5.2 If {&}72, and {¢}°, are sequences of independent (in each se- 


d 
quence) random variables and & < &, then, for all n, 


d d 
Sn < Zn, n<Zn, 


where 


n n 
Sn = ks LS es Sy = max Sr, Zn = max Zp. 
nN ds n us n k<n n k<n 


d d 
Similarly, if & > €%, then ming<y Sp > Ming<n Zp. 


Proof Let Fx(t) := P(& < t) and Gx(t) := P(¢, < t). Using quantile transforma- 
tions F; i ~) and aed (see Definition 3.2.4) and a sequence of independent random 
variables {ae 1» @ © Up,1, we can construct on a common probability space the 


sequences &° = FO”) ox) and ¢° = GY (ox) such that &; & & and 7 Cx (the 
distributions of &* and & and of ¢;* and ¢/ coincide). Moreover, & < ¢, which is 
a direct consequence of the inequality F(t) > Gx(t) for all t. Endowing with the 
superscript * all the notations for sums and maximum of sums of random variables 
with asterisks, we obviously obtain that 


SE wer Ss. S28 27,47. 


The last assertion of the lemma follows from the previous ones. The lemma is 
proved. 


Below we will need the following corollary of Theorem 10.5.3. 


10.5 LLN and CLT for Renewal Processes 303 


d 
Lemma 10.5.3 Let & be independent, & > € for all k and E¢ > 0. Then, for all n, 
the random variable 


Dn = Sn — Sn >0 


d 
is majorised in distribution by the random variable —Z: Dy, < —Z, where Z := 
inf Zy, Zp = em ¢; and ¢; are independent copies of ¢. 


Proof We have 


Sp =max(0, S},..., Sn) = Sp + max(0, —E,, —E, — En—1,.--, —Sp) 
= Sn —min(0, &), &) + 8,1, oy Sn); 


where, by the last assertion of Lemma 10.5.2, 
. d . d 
—Dn = min(0, €n, &2 + €n-1,---, Sn) = Ze = Z, Dn < —Z. 
<n 


The fact that Z is a proper random variable follows from Theorem 10.5.3 on the 
finiteness of the infimum of partial sums. The lemma is proved. 


If & £ € are identically distributed and a = Eé > 0, then we can put & = ¢. The 
above reasoning shows that in this case the limit distribution of Syn — Sy asn—> oo 
exists and coincides with the distribution of the random variable Z (the random 
variables S, — S, themselves do not have a limit, and, by the way, neither do the 


variables oa in the central limit theorem). 


Lemma 10.5.3 shows that, for &; $ ¢ and E¢ > 0, the random variables S, and 
S, differ from each other by a proper random variable only. This makes the limit 
theorems for S,, and S, essentially the same. 

We proceed to the law of large numbers and the central limit theorem for S;). 


Theorem 10.5.4 Let ay = E& > 0, An = )-¢_| a and Ay ~ an asn—> oo, a > 0. 


d 
Let, moreover, && — ax be uniformly integrable for all k and && > ¢ with Kg > 0. 
Then, asn—> ©, 


Sn P 
— — da. 
n 


Note that the Jeft uniform integrability of & — az follows from the inequalities 


d 
fee. 
Proof By Lemma 10.5.3, 


_ d 
Syn = S,+ Dy, where D, >0, Dn < —Z. (10.5.6) 


304 10 Renewal Processes 


Therefore, 


Bn _ Sn—An | An | Dn 
n n n n 
where by Theorem 8.3.3, as n > ov, 
Sn — An P 
—————_ —> 


n 


It is also clear that 


The theorem is proved. 


In addition to the notation from Theorem 10.5.3, put 
n 
of := E(& — ax)’, B? := ye oe. 
k=1 
Theorem 10.5.5 Let, for some a > 0 ando > 0, 
An =an+o(/n), B? =o°n+o(n), 


d 
and let the random variables & — ax satisfy the Lindeberg condition, && > € with 
K¢é > 0. Then 


ek 
a Oy (10.5.7) 
a/n 


Proof By virtue of (10.5.6), 
S,-an Sy—An By Ay — an Dy, 
o/n ~ By eas o./n a 
where, by the central limit theorem, 
Sn — An 


n 


(10.5.8) 


& 01. 


Moreover, 


Bn Psy ann 3G Pn Pg 


a ae o./n o/n 
This and (10.5.8) imply (10.5.7). The theorem is proved. 


10.5.5 Extension of Theorems 10.5.1 and 10.5.2 to Random 
Variables Assuming Values of Both Signs 


We return to renewal processes and limit theorems for them. In Theorems 10.5.1 
and 10.5.2 we obtained the law of large numbers and the central limit theorem for 


10.5 LLN and CLT for Renewal Processes 305 


the renewal process 7(t) defined in (10.1.1) with jumps t, => 0. Now we drop the 
last assumption and assume that t; can take values of both signs. 


Theorem 10.5.6 Let the conditions of Theorem 10.5.1 be met, the condition tT, > 0 
d 
being replaced with the condition t, > ¢ with Ef > 0. Then 


nt) p 1 
— > -, 
t a 


(10.5.9) 


If t%;% { care identically distributed and Et > 0, then we can put ¢ = t. There- 
fore Theorem 10.5.6 implies the following result. 


Corollary 10.5.1 [f t; are independent and identically distributed and Et = a > 0, 
then (10.5.9) holds true. 


Proof of Theorem 10.5.6 Here instead of (10.5.1) we should use the relation 
{n(t) >n} ={Tn <th, T; = max Ti, Tk= Dt (10.5.10) 


Then we repeat the argument from the proof of Theorem 10.5.1, changing in it T,, 
to T, and using Theorem 10.5.4, which implies that T, and Ty satisfy the law of 
large numbers. The theorem is proved. 


Theorem 10.5.7 Let the conditions of Theorem 10.5.2 be met, the condition tk > 0 


d 
being replaced with the condition t, = ¢ with Eg > 0. Then (10.5.3) holds true. 


Proof Here we again have to use (10.5.10), instead of (10.5.1), and then repeat the 
argument proving Theorem 10.5.2 using Theorem 10.5.5, which implies that the 


Tyan tu an 


, aS well as the distribution of 


distribution of & , converges to the standard 


normal law 0,1. l- The theorem is proved. 


Remark 10.5.2 (An analogue of Remarks 8.3.3, 8.4.1 and 10.1.1) The assertions of 
Theorems 10.5.6 and 10.5.7 can be generalised as follows. Let t, be an arbitrary 
random variable and random variables Te := T1444, k = 1, satisfy the conditions 
of Theorem 10.5.6 (Theorem 10.5.7). Then convergence (10.5.9) (10.5.3) still takes 
place. 

Consider, for example, Theorem 10.5.7. Denote by A, the event 


tees [mor x}. 
co a — 


Then the foregoing assertion follows from the relations 


P(Ay) =E[P(Ax|t1); It1| < NJ +7w, 


306 10 Renewal Processes 


where ry < P(|t;| > N) can be made arbitrarily small by the choice of N, and by 
Theorem 10.5.7 


P(A, | 5 p( =" .o(+) ) ®(x) 
W= <x)—> O(x 
‘ oVJ/(t —1%})/a3 Jt 

as t > oo for each fixed 11, |t1| < N. Here 7*(t) is the renewal process that corre- 
sponds to the sequence {z;'}. 


10.5.6 The Local Limit Theorem 


If we again narrow our assumptions and return to identically distributed tT; £, >0 
then we can derive local theorems more precise than Theorem 10.5.2. In this sub- 
section we will find an asymptotic representation for P(n(t) =n) as t > oo. We 
know from Theorem 10.5.2 what range of values of n the bulk of the distribution 
of 7(t) is concentrated in. Therefore we will from the start consider not arbitrary n, 
but the values of n that can be represented as 


i t 
n=|—+v0,/— |, o” = Var(t), (10.5.11) 
a aa 


for “proper” values of v ([s] in (10.5.11) is the integer part of s), so that 


(t — an) = 


1 
ah -v+0(=) (10.5.12) 


(see (10.5.5)). For the proof, it will be more convenient to consider the probabilities 
P(yn(t) =n + 1). Changing n + | to n amends nothing in the argument below. 


Theorem 10.5.8 If t > 0 is either non-lattice or arithmetic and Var(t) = 07 < 00, 
then, for the values of n defined in (10.5.11), as t > 00, 


“alt 


ov 2nt 


where in the arithmetic case t is assumed to be integer. 


P(n(t) =n +1) ~ gre (10.5.13) 


Proof First let, for simplicity, t have a density and satisfy the conditions of the local 
limit Theorem 8.7.2. Then 


t 
P(n(t) =n +1) =| P(T, € du)P(t >t —u), (10.5.14) 


where by Theorem 8.7.2, as n > 00, 


du (u — na)? 
P(T,, — na € d(u—na)) = | «0| az | +000 | 


10.6 Generalised Renewal Processes 307 


uniformly in uw. Change the variable u = t — z. Since for the values of n we are 
dealing with one has (10.5.12), the exponential 


(u —na)* 1 
wf 5] ol oa) 


remains “almost constant” and asymptotically equivalent to e-¥/? for Iz] < N, 
N-> oo, N = o0(,/n). Hence the integral in (10.5.14) is asymptotically equivalent 
to 


N 
1 2 pes gaz~ v/2 


a 
oV2m0Nn 0 oV/20Nn 
Since n ~ t/a as t > ov, we obtain (10.5.13). 

If t has no density, but is non-lattice, then we should use the integro-local Theo- 
rem 8.7.1 for small A and, in a quite similar fashion, bound the integral in (10.5.14) 
(with t, which is a multiple of A) from above and from below by the sums 


e 


t/A-1 
2 P(T, € A[kA))P(t > t — (k +1)A) 
k=0 

and 

t/A-1 
by P(T, € A[kKA))P(t > t — kA), 
k=0 


respectively. For small A both bounds will be close to the right-hand side 
of (10.5.13). 
If t has an arithmetic distribution then we have to replace integral (10.5.14) with 
the corresponding sum and, for integer u and t, make use of Theorem 8.7.3. 
The theorem is proved. 


If examine the arguments in the proof concerning the behaviour of the correction 
term, then, in addition to (10.5.13), we can also obtain the representation 


P(n(t) =n) = 0 =) (10.5.15) 


3 
a + ( 
oV/20t Jt 


uniformly in v (or in 7). 


10.6 Generalised Renewal Processes 


10.6.1 Definition and Some Properties 


Let, instead of the sequence {tj }24 , there be given a sequence of two-dimensional 
independent vectors (Tj, §j), Tj = 0, having the same distribution as (t, €). Let, as 
before, 


308 10 Renewal Processes 


k k 
Sea) ey i= ye So = To = 0, 
gal f=1 
n(t) = min{k: T, > t}, v(t) = max{k: 7, <t}=n(t)—-1. 


Definition 10.6.1 The process 
Sw) = gt + Sv 


is called a generalised renewal process with linear drift q. 


The process S,,)(¢), as well as v(t), is right-continuous. Clearly, Si (t) = gt for 
t < tT. At time ¢ = 7 the first jump in the process S,(,)(t) occurs, which is of size &: 


Sw) (t — 0) = qt, So)(t1) =u + 1. 


After that, on the interval [T;, T>) the value of S(,)(t) varies linearly with slope q. 
At the point 72, the second jump occurs, which is of size 2, and so on. 
Generalised renewal processes are evidently a generalisation of random walks Sx 
(for t; = 1, q = 0) and renewal processes n(t) = v(t) + 1 (for §; = 1, q = 0). They 
are widespread in applications, as mathematical models of various physical systems. 
Along with the process S,,)(¢), we will consider generalised renewal processes 
of the form 


S(t) = qt + Syay = Say) + nw, 


that are in a certain sense more convenient to analyse since n(t) is a Markov time 
with respect to F, =o(t1,..-,T3 &,---,&,) and has already been well studied. 

The fact that the asymptotic properties of the processes S(t) and S,,)(t), as 
t — ov, (the law of large numbers, the central limit theorem) are identical follows 
from the next assertion, which shows that the difference S(t) — S(,)(t) has a proper 
limiting distribution. 


Lemma 10.6.1 Jf Et < ©, then the following limiting distribution exists 
; E(t; & < v) 
im PEna) <v)= a 
The lemma implies that &, (7) /b(t) —. 0 for any function b(t) > oo as t > oo. 


Proof By virtue of the key renewal theorem, 


oo t 
P(E) <v) = >| P(T, € du)P(t > t —u,& <v) 
k=0%9 


t oe) 
= [anwrcst-us<n> ef P(t >u,& <v)du 
0 Et Jo 


_ E@; § <v) 
= Et , 


The lemma is proved. 


10.6 Generalised Renewal Processes 309 


As was already noted, n(t) is a stopping time with respect to 
Fn =O(T1,..-5 Tn €1,---5&n)- 
Therefore, if (tj, €;) are identically distributed, then by the Wald identity (see The- 
orem 4.4.2 and Example 4.4.5) 
t 
ES(t) = qt + agEn(t) ~ qt + = (10.6.1) 
a 


as t —> 00, where ag = Eé and a = Et. The second moments of S(t) will be found 
in Sect. 15.2. The laws of large numbers for S(t) will be established in Sect. 11.5. 


10.6.2 The Central Limit Theorem 


In order to simplify the exposition, we first assume that the components t; and &; of 


d ; : : : 
the vectors (t;,&j;) = (t,€) are independent. Moreover, without losing generality, 
we assume that gq = 0. 


Theorem 10.6.1 Let there exist 0% = Vart < 00, of = Var(é) < 00 with o + 
o¢ > 0. If the coordinates t and & are independent then, as t > oo, 


S(t) —rt 
osvt 


where r = ag/a and og — a“\(oz +1207) =a"! Var(& — rt). The same assertion 
holds true for S(y)(t) as well. 


S&S 01, 


Proof If one of the values of o and ox is zero, then the assertion of the theorem 
follows from Theorems 8.2.1 and 10.5.2. Therefore we can assume that o > 0 and 
oz > 0. Denote by 6 = o (71, T2,...) the o-algebra generated by the sequence {t;} 
and by A; C © the set 


Ay ={|n@)—t/a|<t'/7**}, 2 € (0,1/2). 
Since by the central limit theorem P(A;) > 1 as t > o, for any trajectory n(-) 
in A; we have n(t) — oo as t > oo, and the random variables 
S(t) — aen(t) 
oeV/n(t) 


are asymptotically normal with parameters (0,1) by the independence of {&;} 
and {t;}. In other words, on the sets A;, 


Z(t) = 


E(e*2© |S) > e-¥l2 ast oo. 


Since 


ont 


t t 
nt) = — + a7 bt, or & ®o1, and n(t)~ — 
a a a 


310 10 Renewal Processes 


on the sets A; € ©, we also have on the sets A; the relation 


‘ dagoV/t 
iA(S(t) — rt — = a) 12/2 
B(ex0| Geli ||e) >e ‘ 


Since the random variables ¢; and y(t) are measurable with respect to 6, the corre- 
sponding factor can be taken outside of the conditional expectation, so that 


e( {peo—} 6) 7 i" in ikro | 
exp of Jtfa exp yy Oz Cr : 
: 9) : 
Bexp| CO | =o(1) +E(ex|—> + aaa A,) 


t/a 
Ne ro \* 
=o(1) +exp| Ff + (=) |}. 
Oé 


Hence 


This means that 


where 


The assertion corresponding to S(,)(¢) follows from Lemma 10.6.1. The theorem 
is proved. 


Note that Theorems 8.2.1 and 10.5.2 are special cases of Theorem 10.6.1. If 
ag =0, then S(t) is distributed identically to St;/q] and is independent of o. 

Now consider the general case where t and & are, generally speaking, dependent. 
Since T;(1) = t + x(t), we have the representation 


S(t) —rt =Zya +rx(t), (10.6.2) 
where 


xO _?, 9 


Jt 


as t > co (x(t) has a proper limiting distribution as t > oo). Moreover, we will 
use yet another Wald identity 


tay 6; =§) —rtj, Ec; =0, 


j=l 


EZ? 


a) =@ Ent), a =EC?, ¢=8-rt, (10.6.3) 


that is derived below in Sect. 15.2. 


10.6 Generalised Renewal Processes 311 
Theorem 10.6.2 Let (t;,&;) = (t,&) be independent identically distributed and 
such that o* = Var(t) < 00 and of = Var(E) < 00 exist. Then 
S(t) —rt 
osvt 


where r = ag /a and os = a!'d*. The random variables 


& 01, 


Sq) (t)—rt Zn(t) 
and have 
svt vt 


the same limiting distribution. 


Proof It is seen from (10.6.2) that it suffices to prove that 


Zn(t) 
ost 


The main contribution to Z,() comes from Z,, with m = [4 —2N./t], N > 00, 


N =o(,/f ), where 
fe Lin = Zm ma & Oo. 
dJ/t dJ/mV t 


The remainder Z,(1) — Zm, for each fixed 


S Oo 1. 


Tm € Iy :=[t —3aN V1, t—aNvV*], P(Tm € Iv) > 1, 


has the same distribution as Z,(;—7,,), and its variance (see (10.6.3)) is equal to 
a@En(t — Tm) ~ gain. 3d°N Vt = o(t). 
a 


Since EZ,(;—7,,) = 9, we have 


Zn(t—Tn) 


z 10.6.4 
7 — 0 (10.6.4) 


as t — oo. The theorem is proved. 


Note that, for N — oo slowly enough, relation (10.6.4) can be derived using 
not (10.6.3), but the law of large numbers for generalised renewal processes that 
was obtained in Sect. 11.5. 

Theorem 10.6.1 could be proved in a somewhat different way—with the help 
of the local Theorem 10.5.3. We will illustrate this approach by the proof of the 
integro-local theorem for S(t). 


10.6.3. The Integro-Local Theorem 


In this section we will obtain the integro-local theorem for S(t) in the case of non- 
lattice €. In a quite similar way we can obtain local theorems for densities (if they 
exist) and for the probability P(S(t) = k) for q = 0 for arithmetic &;. 


312 10 Renewal Processes 


Theorem 10.6.3 Let the conditions of Theorem 10.6.1 hold and, moreover, — be 
non-lattice. Then, for any fixed A > 0, as t > ov, 


A x 1 
P(S(t) —rte A[x)) — oa (=) + (=). (10.6.5) 


where the remainder term o(1//t) is uniform in x. 


Proof Since & is non-lattice, one has og > 0. If o = 0 then the assertion of the 
theorem follows from Theorem 8.7.1. Therefore we will assume that o > 0. By the 
independence of {&;} and {t;}, 


[o,@) 
P(S(.) —rt€ A[x)) = ) > P(n@) =n)P(S, rte Als) = D+ Do, 

n=1 neM, né€M; 
where M, = {n: |n —t/a| <t!/2N(t)}, N(t) > 00, N(t) = 0(,/t) as t > 00. We 
know the asymptotics of both factors of the terms in the sum from Theorems 8.7.1 
and 10.5.8 (see also (10.5.15)). It remains to do the summation, which is unfortu- 
nately somewhat cumbersome. At the same time, it presents no substantial difficul- 
ties, so we will sketch this part of the proof. If we put an — t =: u, 


Pi(t) A | (x — | Px(t) a3/? | u2 
1 — ex ‘ 2 = ex o 
og 20n . 2noz ovV2nt Pl 202n 
then 
1 
P(S, —rt € A[x)) = P(t) + o(—). 
Furthermore, 


P(n(t) =n) = Po(t) (=) 
HU) =n) = F2 CaN 


for n € M; and N(t) > oo slowly enough as t > oo. Clearly, 


y-(z) 


Since the sums of P;(t) and P2(t) are bounded in n by a constant, we have 


: =o(—) + PMP. 


neM, neM, 
The exponent in the product P;(t) P2(t), taken with the negative sign, is equal to 
1 [= a a [eee =| 


2n of o2 2t Poro; d2 


where d? = r207 + of. Since, for x = o(./t N(t)), 
aal*d a(d*u — rxo*)” \ 
ex > 
V2 to 0¢ , 2td>07o7 


neA; 


10.6 Generalised Renewal Processes 313 


as t — oo and this sum does not exceed 1 + o(1) for all x (this is an integral sum 
that corresponds to the integral of the density of the normal law), it is easy to de- 


rive (10.6.5) from the foregoing. 


We will continue the study of generalised renewal processes in Sect. 11.5. 


Chapter 11 
Properties of the Trajectories of Random Walks. 
Zero-One Laws 


Abstract The chapter begins with Sect. 11.1 establishing the Borel—Cantelli and 
Kolmogorov zero-one laws, and also the zero-one law for exchangeable sequences. 
The concepts of lower and upper functions are introduced. Section 11.2 contains 
the first Kolmogorov inequality and several theorems on convergence of random se- 
ries. Section 11.3 presents Kolmogorov’s Strong Law of Large Numbers and Wald’s 
identity for stopping times. Sections 11.4 and 11.5 are devoted to the Strong Law of 
Large Numbers for independent non-identically distributed random variables, and to 
the Strong Law of Large Numbers for generalised renewal processes, respectively. 


11.1 Zero-One Laws. Upper and Lower Functions 


Let, as before, S, = Vii €; be the sums of independent random variables 
&1,&2,.... In this chapter we will consider properties of the “whole” trajectories 
of random walks {Sp}. 

The first limit theorem we proved for the distribution of the sums of independent 


identically distributed random variables was the law of large numbers: S,,/n a Eé. 
One could ask whether the whole trajectory Sy/n, Sn4i1/(n+ 1),..., starting from 
some n, will be close to E€ with a high probability. That is, whether, for any e > 0, 
we will have 


lim P( sup Sk — Eé 
n—>0o ken k 
This is clearly a problem on almost sure convergence, or convergence with probabil- 
ity 1. A similar question arises concerning generalised renewal processes discussed 
in Sect. 10.6. 

Assertion (11.1.1), which is called the strong law of large numbers and is to be 

proved in this chapter, is a special case of the so-called zero-one laws. As the first 
such law, we will now present the Borel—Cantelli zero-one law. 


<e) = 1. (11.1.1) 


11.1.1 Zero-One Laws 


Theorem 11.1.1 Let {A,}"°, be a sequence of events on a probability space 


(92,%,P), and let A be the event that infinitely many events Ax occur, i.e. 


A.A. Borovkov, Probability Theory, Universitext, 315 
DOI 10.1007/978-1-4471-5201-9_11, © Springer-Verlag London 2013 


316 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


A=(\, UR, Ak (the event A consists of those w that belong to infinitely many 
Ax). 

If ¥-p-, P(Ax) < 00, then P(A) = 0. If )°72, P(Ag) = & and the events Ax are 
independent, then P(A) = 1. 


Proof Assume that )°72., P(Ax) < 00. Denote by n = S772, I(Ax) the number of 
occurrences of events A;. Then Ey = ul , P(Ax) < 00 which certainly means that 
n 1s a proper random variable: P(n < oo) = 1 — P(A) = 1. 

If Ax are independent and par P(A;) = 00, then, since Ay = @ \ Ax are also 
independent, we have 


[o.@) CO 
nr ane() =m 9(2-F10) 
=n 


k=n 


oo m 
a= sin.r( () % re i, iP () “i 


k=n k=n 
[o.@) 
= au — P(A,)). 
=n 


Using the inequality In(1 — x) < —x we obtain that 


Tt — P(A,)) < of - Srey} 


k=n k=n 


Hence 
[[@-PGe) se =0, P(A=1. 
k=n 


The theorem is proved. 


Remark 11.1.1 Yt follows from Theorem 11.1.1 that, for independent events Ax, 
the assertions that En < oo and that P(7 < oo) = | are equivalent to each other. 
Although in one direction this relationship is obvious, in the opposite direction it 
is quite meaningful. It implies, in particular, that if 7 < oo with probability 1, but 
En = o, then Ax are necessarily dependent. 

Note also that the argument proving the first part of the theorem has already been 
used for the same purpose in the proof of Theorem 6.1.1. 

Assume that {&,}°° ; is a sequence of independent random variables given on 
(92, §, P). Denote, as before, by o (&|,..., &,) the o-algebra generated by the first 
n random variables &),...,&,, and by o(&,,...) the o-algebra generated by the 
random variables &), &)41, &:42,.... 


Definition 11.1.1 An event A is said to be a tail event if A € o(&,...) for any 
n>0. 


11.1 Zero-One Laws. Upper and Lower Functions 317 


For example, the event 


A=(\Ut&>™} 


n=1k=n 


meaning that there occurred infinitely many events {& > N} is clearly a tail event. 


Theorem 11.1.2 (Kolmogorov zero-one law) If A is a tail event, then either 
P(A) =0 or P(A) = 1. 


Proof Since A is a tail event, A € o(&)41,...), n => 0. Therefore the event A is 
independent of the o-algebra o (&,...,&,) for any n. Hence (see Theorem 3.4.3) 
the event A is independent of the o-algebra 0 (&1,...). Since A € o (&,...), it is 
independent of itself: 


P(A) = P(AA) = P(A)P(A). 


But this is only possible if P(A) = 0 or 1. The theorem is proved. 


Put S = sup{0, $1], S2,...}, where S, = yi &,. An example of an application 
of the above theorem is given by the following 


Corollary 11.1.1 Jf &,k =1,2,..., are independent, then either P(S = oo) = 1 
or P(S < co) =1. 
The Proof follows from the fact that {S = 00} is a tail event. Indeed, for any n 
{S = oo} = {sup(Sp—1, ee co} 
= {sup(0, Sn — Sp-1,--.) = 00} €o(&,...). 


Further examples of tail events can be obtained if we consider, for a sequence 
of independent variables &,, &,..., the event {the series pie &% is convergent}. 
Theorem 11.1.2 means that the probability of that event can only be 0 or 1. 

If we consider the power series paar zk& where & are independent, we will 
see that the convergence radius p = limsup,;_, . |&|—1/* of this series is a random 
variable measurable with respect to the o-algebra o (&,,...) for any n ({o <x}e€ 
o(En,...), 0 < x < oo). Such random variables are also called tail random vari- 
ables. Since by the foregoing one has F(x) = P(e < x) = 0 or 1, this implies 
that p, as well as any other tail random variable, must be equal to a constant with 
probability 1. 

Under the assumption that the elements of the sequence {&}7° ; are not only in- 
dependent but also identically distributed, Kolmogorov’s zero-one law was extended 
by Hewitt and Savage to a wider class of events. 

Let w = (x1, x2,...) be an element of the sample space (R™, 8°, P) for the 
sequence & = (1, &,...) (R™ is a countable direct product of the real lines R,, 
k=1,2,..., BY =o(&,...) is generated by the sets Th, Bye o(&,...,€n), 
where By; € o (x) are Borel sets on the lines Rx). 


318 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


Definition 11.1.2 An event A € 8% is said to be exchangeable if 
(41,2, 6.+5%n—1) Xn Ant1.... EA 


implies that (xn, X2,..-,Xn—1,%1,Xn+1---) € A for every n > 1. It is evident that 
this condition of membership automatically extends to any permutations of finitely 
many components. Examples of exchangeable events are given by tail events. 


Theorem 11.1.3 (Zero-one law for exchangeable events) If & are independent and 
identically distributed and A is an exchangeable event, then either P(A) = 0 or 
P(A) = 1. 


Proof By the approximation theorem (Sect. 3.5), for any A € 8° there exists a 
sequence of events A, € o(&,...,&,) such that 


P(A, AU AAn) > 0 


asin — oOo. 
Introduce the transformation 


T,@= Ty (x1, X2,-..) = (Xn41, sees XQns X15 +++ Xns X2n41 saa) 


and put B, = T, An. If A is exchangeable, then 7, A = A and, for any B € B®, one 
has P(T,, B) = P(B) since §; are independent and identically distributed. Therefore 
P(B, A) = P(T, An A) = P(A,A), and hence B, will also approximate A, which 
obviously implies that C, = A,B, will have the same approximation property. By 
independence of A, and B,,, this means that 


P(A) = lim P(A,B,) = lim P?(A,) =P? (A). 
n—->oo n—->Oo 


The theorem is proved. 


11.1.2. Lower and Upper Functions 


Theorem 11.1.3 implies the following interesting fact, the statement of which re- 
quires the next definition. 

Definition 11.1.3 For a sequence of random variables {n,}°° ,, a numerical se- 
quence {d,}P° , is said to be an upper sequence (function) if, with probability 1, 
there occur only finitely many events {7 > a,}. A sequence {an}ro , is said to bea 
lower sequence (function) if, with probability 1, there occur infinitely many events 


{1 > Gn}. 


Corollary 11.1.2 If & are independent and identically distributed, then any se- 
quence {an} is either upper or lower for the sequence of sums {Sp}°°_, with 


Sn = ya Ex. 


11.1 Zero-One Laws. Upper and Lower Functions 319 


In other words, one cannot find an “intermediate” sequence {a,} such that the 
probability of the event A = {S,, > a, infinitely often} would be equal, say, to 1/2. 


Proof To prove the corollary, it suffices to notice that the event A is exchangeable, 
because swapping &) and &, in the realisation (&), &,...) influences the behaviour 
of the first n sums $1,..., S, only. 


A similar fact holds, of course, for the sequence of random variables {En} 
itself, but, unlike the above corollary, that assertion can be proved more easily, since 
B= {é, > a infinitely often} is a tail event. 


Remark 11.1.2 In regard to the properties of upper and lower sequences for sums 
{S,} we also note here the following. If P(é& = c) #1, and {a,} is an upper (lower) 
sequence for {S,}, then, for any fixed k > 0 and v, the sequence {by = dn4x4+ LT eae 
is also upper (lower) for {S,,}. This is a consequence of the following relations. Let 
v1 > v2 be such that 


P(E > v1) > 0,7 P(E < v2) > 0. 
Then, for the upper sequence {a,} and the event A = {S,, > a, infinitely many times}, 
we have 
0= P(A) > PEI > v1) P(AIE1 > v1) 
> PE, > vy) P(Sp > Gn41 — vy infinitely many times). 
This implies that the second factor on the right-hand side equals 0, and hence the 
sequence {a,41 — v1} is also an upper sequence. On the other hand, if &’ £ & is 
independent of € then 
0 = P(A) = P(E’ + S, > &' + dy infinitely many times; &' < v2) 
> PE < v2)P(Sp41 > an + v2 infinitely many times) 
= P(E < v2)P(Sy > an—1 + v2 infinitely many times). 
Here the second factor on the right-hand side equals 0, and hence the sequence 


{an—1 + v2} is also upper. Combining these assertions as many times as necessary, 
we find that the sequence {a,+4, + v} is upper for any given k and v. 


From the above remark it follows, in particular, that the quantities lim sup,,_, .5 Sn 
and liminfy—.oo S, cannot both be finite for a sequence of sums of independent 
identically distributed random variables that are not zeros with probability 1. Indeed, 
the event B = {limsup,_, ,, Sn € (a, b)} is exchangeable and therefore P(B) = 0 or 
P(B) = 1 by virtue of the zero-one law. If P(B) were equal to 1, (b, b, ...) would be 
an upper sequence for {S,,}. But, by our remark, (a, a,...) would then be an upper 
sequence as well, which would mean that 


P(limsup S, < a) = 


noo 


320 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


which contradicts the assumption P(B) = 1. 


The reader can also derive from Theorem 11.1.3 that, for any sequences {ay} 
and {b,}, the random variables 
an —4n 


Sn — S; 
lim sup ———~ and liminf 


n— oo n n> oo n 


are constant with probability 1. 


11.2 Convergence of Series of Independent Random Variables 


In the present section we will discuss in more detail convergence of series of inde- 
pendent random variables. We already know that such series converge with proba- 
bility 1 or 0. We are interested in conditions ensuring convergence. 

First of all we answer the following interesting question. It is well known that the 
series )--° , n° is divergent for a < 1, while the alternating series )--° ;(—1)"n~* 
converges for any a > 0 (the difference between neighbouring elements is of order 
an—@—!), What can be said about the behaviour of the series per d,n~ *, where 
6, are identically distributed and independent with E6é,, = 0 (for instance, 6, = +1 
with probabilities 1/2)? 

One of the main approaches to studying such problems is based on elucidat- 
ing the relationship between a.s. convergence and the simpler notion of conver- 
gence in probability. It is known that, generally speaking, convergence in prob- 


ability &, & € does not imply a.s. convergence. However, in our situation when 
= S,.= aa &, & being independent, this is not the case. The main assertion 
of the present section is the following. 


Theorem 11.2.1 If & are independent and Sy, = )~;_, &, then convergence of Sp 
in probability implies a.s. convergence of Sy. 


We will prove that S,, is a Cauchy sequence. To do this, we will need the follow- 
ing inequality. 


Lemma 11.2.1 (The First Kolmogorov inequality) If §; are independent and, for 
some b> OQandall j <n, 


P(IS, = Sj| > b) <p<l, 
then 


1 
P( max |5j1 =x) < -—PUGil > x ~ 4). (11.2.1) 


ign 
Corollary 11.2.1 [fE&; =0 then 


P( max |Sj| = x) < 2P(|Sn| >x- V2 Var(Sn)). 
j<n 


11.2 Convergence of Series of Independent Random Variables 321 


Kolmogorov actually established this last inequality (Lemma 11.2.1 is an in- 
significant extension of it). It follows from (11.2.1) with p = 1/2, since by the 
Chebyshev inequality 


Var(Sn — Sj) _ 1 
P —S§;|>J/2V. < —__—_<--. 
(|Sn — Sj] 2 ¥2-Var(Sn)) < 2Var(S,) 2 


Proof of Lemma 11.2.1 Let 
ni= {mink > 1: |S¢| =x}. 


Put Aj :={n = j}, j=1,2,.... Clearly, Aj; are disjoint events and hence 


n n 
P(|Sp| > x —b) =D P(|Sp| > x — 5; Aj) = > P(|Sp — Sj| <b; Aj). 
j=l j=l 
(The last inequality holds because the event {|S,, — S;| < b}Aj implies {|S,| > 
x —b}A;.) But Aj eo(&1,...,€;) and {|S, —S;| < b} €o (€j41,..., &). Therefore 
these two events are independent and 


P(|Sp| > x —b) = Y/P(Aj)P(ISn — Sj <b) 
j=l 
> (1 p) Y)P(A) = (1 ~ p)P( max 5/1 >»). 
; is 


The lemma is proved. 


Proof of Theorem 11.2.1 It suffices to prove that {S;,} is a.s. a Cauchy sequence, 1.e. 
that, for any ¢ > 0, 


P( sup |Sn — S| > 2e) = (1123) 


n>m 


as m —> oo. Let 


Ae n= {ISn— Sml > e}, Af := [J As 


nm nwm* 
n>m 


Then relation (11.2.2) can be written as 
P(A?*) > 0 (11.2.3) 
as m —> OO. 


Since {S;,} is a Cauchy sequence in probability, one has 


Pm.M ‘= sup P(A‘ ys) >0 


m<n<M 


as m — oo and M > ov, so that pm,w < 1/2 for all m and M large enough. For 
such m and M we have by Lemma 11.2.1, for a = € and x = 2e, that 


322 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


M 
P( sup 5. Sal > 2¢)=P( U A] 


msn<M n=m+1 
= Topas Aim) < 2P(Aiym)- 
m, 
By the properties of probability, 
M 
2 : 2 : 
P(A;*) = sin. P( U A] < 2lim sup P(A\y ,,)- (11.2.4) 


n=m+1 M- 00 
Denote by S the limit (in probability) of the sequence S,,, and 
Be := {|S,— S| > e}. 


Then P(BS) > 0as n> 00, A‘, ,, C Byy U Bn, and by (11.2.4) 


P(A2*) < 2P(B;/") >0 asm>oo. 


Relation (11.2.3), and hence the assertion of the theorem, are proved. 


Corollary 11.2.2 If E& =0 and “f° Var(&) < 00, then Sy converges a.s. 


Proof The assertion follows immediately from Theorem 11.2.1 and the fact that 
{S;,} is a Cauchy sequence in mean quadratic (E(S,, — Sin)? = are Var(&) > 0 
as m — co and n — oo) and hence in probability. 

It turns out that if E& = 0 and |&| < c for all k, then the condition 
>= Var(Ex) < 00 is necessary and sufficient for a.s. convergence of Sis 

Corollary 11.2.2 also contains an answer to the question posed at the beginning 
of the section about convergence of }>5,n~®, where 5, are independent and identi- 
cally distributed and Eé,, = 0. 


Corollary 11.2.3 The series )~ 3nd, converges with probability 1 if Var(5x) = 
o* < 00 and >a < oo. 


Thus we obtain that the series )>6,~°%, where 6, = £1 with probabilities 1/2, 
is convergent if and only if a > 1/2. 
An extension of Corollary 11.2.2 is given by the following. 


Corollary 11.2.4 (The two series theorem) A sufficient condition for a.s. conver- 
gence of the series > é, is that the series ys Ké,, and ys Var(&,) are convergent. 


The Proof is obvious, for the sequences )~;_, E&, and )~7_, (& — E&) converge 
a.s. by Corollary 11.2.2. 


'For more detail, see e.g. [31]. 


11.3. The Strong Law of Large Numbers 323 


11.3 The Strong Law of Large Numbers 


It is not hard to see that, using the terminology of Sect. 11.1, the strong law of large 
numbers (11.1.1) means that, for any ¢ > 0, the sequence {en}°° | is an upper one 
for both sequences {S,,} and {—S,} only if E&; = 0. 

We will derive the strong law of large numbers as a corollary of Theorem 10.5.3 
on finiteness of the infimum of sums of random variables. 

Let, as before, &,, 2, ... be independent and identically distributed, € e EK. 


Theorem 11.3.1 (Kolmogorov’s Strong Law of Large Numbers) A necessary and 


sufficient condition for S,/n **S a is that there exists Ke =a. 


Proof Sufficiency. Assume, without loss of generality, that E&; = 0. Then it follows 
from Theorem 10.5.3 that the random variable Z©) = inf, o(S% + ek) is proper 
for any ¢ > O (S; + ek is a sum of random variables & + ¢ with E(é& + ¢) > 0). 
Therefore, 


. Sk 
P( jnt . 2.) < P/ ts +ek< -en)) <P(Z® <—en) > 0 


k>n 


as n — oo. Ina similar way we find that 


Ry 

P( sup > 26) >0 asn-w. 
k>n 

Since P(sup,,, |Sx/k| > 2€) does not exceed the sum of the above two probabili- 


ties, we obtain that S,/n 20: 
Necessity. Note that 
En _ Sn n—1 Sp-1 a5, 0, 


n n nn—-1l 


so that the event {|&,/n| > 1} occurs finitely often with probability 1. By the Borel— 
Cantelli zero-one law, this means that ee 1P(l&/n| > 1) < © or, which is the 
same, )- P(|é| >) < 00. Therefore, by Lemma 10.5.1, EE < 00 and with necessity 
Ké = a. The theorem is proved. 


Thus the condition E& = 0 is necessary and sufficient for {en}°° , to be an upper 
sequence for both sequences {S,} and {—S,}. In the next chapter, we will derive 
necessary and sufficient conditions for {en} to be an upper sequence for each of 
the trajectories {S,} and {—S,,} separately. Of course, such a condition, say, for the 
sequence {S,,} will be broader than just Eé, = 0. 

We saw that the above proof of the strong law of large numbers was based on 
Theorem 10.5.3 on the finiteness of inf S$; which is based, in turn, on Wald’s iden- 
tity stated as Theorem 4.4.3. There exist other approaches to the proof that are unre- 
lated to Theorem 4.4.3 (see below, e.g. Theorems 11.4.2 and 12.3.1). Now we will 


324 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


show that, using the strong law of large numbers, one can prove Wald’s identity 
for stopping times without any additional restrictions (see e.g. conditions (a)—(d) in 
Theorem 4.4.3). Furthermore, in our opinion, the proof below better elucidates the 
nature of the phenomenon we are dealing with. 

Consider stopping times v with respect to a family of o-algebras of a special 
kind. In particular, we assume that a sequence {j}524 of independent identically 
distributed random vectors ¢; = (§;, Tj) is given (where t; can also be vectors) and 


Fy t=0(01,.--, on). (11.3.1) 


Theorem 11.3.2 (Wald’s identity for stopping times) Let v be a stopping time with 
respect to the family of o-algebras F, and assume one of the following conditions 
hold: (a) Ev < 00; or (b) a := Eé; £0. 

Then 


ES, =aEv. (11.3.2) 


The assertion of the theorem means that Wald’s identity is true whenever the 
right-hand side is defined, i.e. only the indefinite case 0 - oo is excluded. Roughly 
speaking, identity (11.3.2) is valid whenever it makes sense. 

This identity implies that, when Ev < oo, the condition a 4 0 is superfluous 
and that, for a 4 0, the finiteness of ES,, implies that of Ev. If a = 0 then the last 
assertion is not true. The reader can easily illustrate this fact using the fair game 
discussed in Sect. 4.2. 


Proof of Theorem 11.3.2 By the strong law of large numbers, for all large k, the ratio 
S;/k lies in the vicinity of the point a. (Here and in what follows, we leave more 
precise formulations to the reader.) By Lemma 11.2.1, the sequence {¢,44}72, has 
the same distribution as the original sequence {¢,}?° ,. For this “shifted” sequence, 
consider the stopping time v2 defined the same way as v for the original sequence. 
Put v; := v and consider the sequence {Sy 440} Rry which is again distributed 
as {fx}¢2, (for vj + v2 is again a stopping time). For the new sequence, define 
the stopping time v3, and so on. Clearly, the vy, are independent and identically 
distributed, and so are the differences 
n 
Sn, —Sn,y, &K2=1, So=O0, where Ny := yy 
j=l 
By virtue of the strong law of large numbers, Sy, / Nx also lie in the vicinity of the 
point a for all large k (or Nx). 
If Ev < oo then N;,/k lie in the vicinity of the point Ev as k — oo. Since 

Su _ Sm 

Neo ON’ 
Sy, /k is necessarily in a neighbourhood of the point aEv for all large k. This means 
that the expectation ES, = aEv exists. 


(11.3.3) 


11.3 The Strong Law of Large Numbers 325 


If Ev = oo then, for a > 0, the assumption that the expectation ES), exists and 
is finite, together with equality (11.3.3) and the previous argument, leads to a con- 
tradiction, since the limit of the left-hand side of (11.3.3) equals a > 0, but that of 
the right-hand side is zero. The contradiction vanishes only if ES, = oo. The case 
a <0 is dealt with in the same way. The theorem is proved. 


We now return to the strong law of large numbers and illustrate it by the following 
example. 


Example 11.3.1 Let m = (@1, @2,...) be a sequence of independent random vari- 
ables taking the values | and 0 with probabilities p and g = 1 — p, respectively. To 
each such sequence, we put into correspondence the number 


CO 
E=E(w) =) ay2*, 
k=1 
so that w is the binary expansion of &. It is evident that the possible values of & fill 
the interval [0, 1]. 

We show that if p = q = 1/2 then the distribution of & is uniform. But if p 4 1/2, 
then € has a singular distribution. Indeed, if x = pares 1 5p27* where 6, assume the 
values 0 or 1, then 

{E <x} = {61 < 51} U {a1 = 61, w2 < 62} U {a1 = 41, 2 = 42, 3 < 63} U---. 


Since the events in this union are disjoint, for p = 1/2 we have 


(oe) 
PE <x)=)>P@ =61,...,@% = dk, O41 < de41) 


k=0 
[o,@) [o,@) 

= 02 "Poets < be+1) = D2 Seg = x. 
k=0 k=0 


This means that the distribution of € is uniform, i.e. for any Borel set B C [0, 1], the 
probability P(é ¢ B) = mes B is equal to the Lebesgue measure of B. Put 


n n 
Dn =? Qn as 
k=1 k=1 


Then the set {x : limy—o0 Dy /n = p} is Borel measurable and hence 


mes}x: lim —=—+=P( lim —=-). 
noo n i} n>oo n 2 


Since by the strong law of large numbers the right-hand side here is equal to one, 


. Dn 1 
mes}x: lim —=-;=1. 
n>o n 2 
In other words, for almost all x € [0, 1], the proportion of ones in the binary expan- 


sion of x is equal to 1/2. 


326 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


Now let p 4 1/2. Then 


although, as we saw above, 


Fre 
mes}x: lim — =p? =0, 
n>o n 
so that the probability measure is concentrated on a subset of [0, 1] of Lebesgue 
measure zero. On the other hand, the distribution of the random variable € is con- 
tinuous. This follows from the fact that 
[o,@) 
{=x} =[ |lon = 5}, 
k=1 
if x is binary-irrational. 
If € is binary-rational, i.e. if, for some r < ov, either 6, = 0 forall k > r or 6, = 1 
for all k > r, the continuity follows from the inclusion 


{=x} Cf for =0}+ ( fox =U, 


k=r k=r 


since the probabilities of the two events on the right-hand side are clearly equal to 
zero. The singularity of F;(x) for p 4 1/2 is proved. 


We suggest the reader to plot the distribution function of &. 


11.4 The Strong Law of Large Numbers for Arbitrary 
Independent Variables 


Finding necessary and sufficient conditions for convergence 
as. 
Sn /by — a 


when b, + co and the summands &1, &,... are not identically distributed is a diffi- 
cult task. We first prove the following theorem. 


Theorem 11.4.1 (Kolmogorov’s test for almost everywhere convergence) Assume 


that &,k =1,2,..., are independent, E&, = 0, Var(&) = a; < 00 and, moreover, 
(oe) op 

y\ 4 <0. (11.4.1) 
b2 
k=1 “k 


Then Sp /bn “ 0asn—> oo. 


11.4 The Strong Law of Large Numbers for Arbitrary Independent Variables 327 


Proof It follows from the conditions of Theorem 11.4.1 that (see Corollary 11.2.2) 
the series )°?° , &/by is convergent with probability 1. Therefore the assertion of 
Theorem 11.4.1 is a consequence of the following well-known lemma from calcu- 
lus. 


Lemma 11.4.1 Let by + co and a sequence x1,Xx2,... be such that the series 
aa x; is convergent. Then, as n — ov, 


DEXK => 0. 


>| a 
ek: 


| 


Proof Put Xn = Yopen41%k So that X, > 0 as n > oo, and X := 
max,>0 |Xn| < oo. Using the Abel transform, we obtain that 


n n n—-1 n 
So dexe = So be (Xe —X,)= So bes 1 Xk - So dK Xk 
k=1 k=1 k=0 k=1 


n—-1| 
= So (be41 = be) Xe + 1X0 — ba Xn, 
k=1 
1 n—-1 
lim sup 53 bexe < lim nsup > pe by) Xk. (11.4.2) 


noo Dp 
k=1 


Here, for a given ¢ > 0, we can choose an N such that |X;| < ¢ fork > N. Therefore 


a—1 N=1 n—1 
Viet — bX < Yo bit — bX +e D5 Ores — be) 
k=1 k=1 k=N 


= X (by — bi) + €(by, — by). 
From here and (11.4.2) it follows that 


pus rere = bexK < €. 


Since a similar inequality holds for lim inf, the lemma is proved. 


We could also prove Theorem 11.4.1 directly, using the Kolmogorov inequality, 
in a way similar to the argument in Theorem 11.2.1. 


Example 11.4.1 Assume that &,k = 1,2,..., are independent random variables 
taking the values & = +k® with probabilities 1/2. As we saw in Example 8.4.1, 
for a > —1/2, the sums S, of these variables are eyiapionealy normal with the 
appropriate normalising factor n~%~!/?, Since Var(&) = i= k?*, we see that, for 


328 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


B>a-+1/2,n78S, satisfies the strong law of large numbers because, for by = k?, 
the series 


[o,@) Cc 
> Keb = 524-28 
k=1 k=1 


converges. The “usual” strong law of large numbers (with the normalising factor 
n—!) holds if the value 6 = 1 is admissible, i.e. when a < 1/2. 


Now we will derive the “usual” strong law of large numbers (with scaling factor 
1/n) under conditions which do not assume the existence of the variances Var(&;) 
and are, in a certain sense, minimal. The following generalisation of the “sufficiency 
part” of Theorem 11.1.3 is valid. 


Theorem 11.4.2 Let E& = 0 and the tails P(|&,| > t) admit a common integrable 
majorant: 


[o@) 
P(\é| >t) <8), / g(t)dt <0o. (11.4.3) 
0 
Then, asn > ©, 
bY 
oe ey, (11.4.4) 
n 


Note that condition (11.4.3) can also be rewritten as |&;| £ 6, Et < w. To 
see this, it suffices to consider a random variable ¢ > 0 for which P(¢ > ft) = 
min(1, g(t)). Here, without loss of generality, we can assume that g(t) is non- 
increasing (we can take the minimal majorant g(t) := sup; P(|&j| >t) J). 

Condition (11.4.3) clearly implies the uniform integrability of &;. The latter was 
sufficient for the law of large numbers, but is insufficient for the strong law of large 
numbers. This is shown by the following example. 


Example 11.4.2 Let & be such that, for t > 0 and k > 1, 


Pe, >1) = ming ift <k, 
g(t) ift>k, 
PE < —t) < g(t), 
where g(t) is integrable so that the & has a positive atom of size 1/(kInk) at the 
point k. Evidently, the & are uniformly integrable. Now suppose that S,/n #*G. 
Since 


[o,@) [o,@) 1 

P >k)> = 
P&Z H= DT =m, 
k=2 k=2 


it follows by the Borel—Cantelli lemma that infinitely many events {& > k} occur 
with probability 1. Since, for any ¢ < 1/2 and all k large enough, |S;| < ek with 
probability 1, the events Spi, = Sx + &&41; > k(1 — €) occur infinitely often. We 
have obtained a contradiction. 


11.4 The Strong Law of Large Numbers for Arbitrary Independent Variables 329 


Proof of Theorem 11.4.2 Represent the random variables &; in the form 


HEE, = EL (lel <k), = EA (ll SA), 
and denote by S; and S;* the respective sums of random variables &* and &*. Then 
the sum S, can be written as 


Sn = (Sp — ES*) + S* — ES. (11.4.5) 


Now we will evaluate the three summands on the right-hand side of (11.4.5). 
1. Since & are uniformly integrable, we have 
E&f* =0(01) ask> ov, 
ES** (11.4.6) 


ES;* = o(n), " -+Q asn>oo. 
n 


2. Since 


Y > P(l&l > k) < D gtk) < 00, 


we obtain from Theorem 11.1.1 that, with probability 1, only a finite number of 
random variables E* are nonzero and hence, as n — oo, 


See as. 


0: (11.4.7) 


n 
3. To bound the first summand on the right-hand side of (11.4.5) we make use of 
Theorem 11.4.1. Since 


k k 
Var(&) < E(&)° = ay uP(\&| =u) du < 2f ug(u) du, 
0 o 
we see that the series in (11.4.1) for &f — E&* and by = k admits the upper bound 
2)" af ug(u) du. (11.4.8) 
k=1 
The last series converges if the integral 


[o.e) 1 t 
y =(/ nel) du) dt 


converges. Integrating by parts, we obtain 


1 t 
-- | ug(u)du 
t Jo 


The last summand here is clearly finite. Since g(u) is integrable and monotone, one 
has 


+ [ swae. (11.4.9) 
1 1 


t 
ug(u)=o(1) asu>on, / ug(u)du=o(t) ast>o, 
0 
and hence the value of the first summand in (11.4.9) is zero at tf = oo. We have 
established that series (1 1.4.8) converges, and hence, by Theorem 11.4.1, asm — oo, 
S* —E S* as, 


n 


0. (11.4.10) 


330 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


Combining (11.4.5)-(11.4.7) and (11.4.10), we obtain (11.4.4). The theorem is 
proved. 


11.5 The Strong Law of Large Numbers for Generalised 
Renewal Processes 


11.5.1 The Strong Law of Large Numbers for Renewal Processes 


Let {tj} be a sequence of independent identically distributed variables, T, := 


a1 Tj and n(t) := min{k : T, > t}. 


Theorem 11.5.1 /fT; S t and Et =a > 0 exists then, as t > ~, 


t 1 
HG) as} (11.5.1) 
t a 
i.e., for any € > 0, 
nu) 1 
P —-|<eforallu>t|)—>1 (11.5.2) 
u a 
ast > &. 
Proof First let t > 0. Set 
Tk 
An = Fak <e forall kn}. 


The strong law of large numbers for {7} means that P(A,,) > 1 asn > oo. 

Consider the function T(v) := 7|,;, where |v] is the integer part of v. As was 
noted in Sect. 10.1, n(¢) is the generalised inverse function to T(v). In other words, 
if we plot the graph of the function T(v) as a continuous line (including “vertical” 
segments corresponding to jumps) then (t) can be regarded as the abscissa of the 
point of intersection of the graph of T(v) with level t (see Fig. 11.1); for the values 
of ¢ coinciding with 7;, the intersection will be a segment of length 1, and y(t) is 
then to be taken equal to the right end point of the segment. 

Therefore the event that T(v) lies within the limits v/(a + ¢) for all sufficiently 
large v coincides with the event that 7() lies within the limits t(a + e) for all suffi- 
ciently large t. More precisely, 


u u 
An Brim |e > mW) > 
a-—é at+eé 


for all u = nla+e)}, 


This means that 
P(B,) > 1 asn>ow. 


This relation is clearly equivalent to (11.5.2). 


11.5 The Strong Law of Large Numbers for Generalised Renewal Processes 331 


Fig. 11.1 The relative 
positions of a trajectory of 

T (v) and the levels v(a + €) 
(see the proof of 

Theorem 11.5.1) 


T(v) 


Now suppose Tt can also assume negative values. Then 7(f) := min{k : T. > th, 
where Ty = maxy<, 7%, so that n(t) is the generalised inverse function 
of T(v) := Tig Moreover, it is clear that, if T(v) lies within the limits v(a + €) 
for all sufficiently large v, then the same is true for the function T(v). It remains to 
repeat the above argument applying it to the processes T(v) and n(t). The theorem 
is proved. 


Remark 11.5.1 (An analogue of Remarks 8.3.3, 8.4.1, 10.1.1 and 10.5.1) Conver- 
gence (11.5.1) persists if we remove all the restrictions on the random variable 7. 
Namely, the following assertion generalising Theorem 11.5.1 is valid. Let t; be an 
arbitrary random variable and the variables tf = tk-+41 . t, k => 1, satisfy the con- 
ditions of Theorem 11.5.1. Then (11.5.1) holds true. 

The Proof of this assertion is quite similar to the proofs of the corresponding 
assertions in the above mentioned remarks, and we leave it to the reader. 


These assertions show that replacement of one or several terms in the consid- 
ered sequences of random variables with arbitrary variables changes nothing in the 
established convergence relations. (The exception is Theorem 11.1.1, in which the 
condition E min(0, t;) > —oo is essential.) This fact will be used in Chap. 13 de- 
voted to Markov chains. 


11.5.2 The Strong Law of Large Numbers for Generalised 
Renewal Processes 


Now let a sequence of independent identically distributed random vectors (7;, & ;) a 
(t,&) be given and S, = Liat €;. Our goal is to obtain an analogue of Theo- 
rem 11.5.1 for generalised renewal processes S(t) = Sy r) (See Sect. 10.6). 


332 11 Properties of the Trajectories of Random Walks. Zero-One Laws 


Theorem 11.5.2 [ft > 0 and there exist a := Et and ag := E& then 
S(t 
(t) LEN ag 


t a 


ast > ©. 


The Proof of the theorem is almost obvious. It follows from the representation 
Si) Syn) 


f° ns <2 
and the a.s. convergence relations 
Sn a.s. n(t) as. I 
— dg, —_ > -. 
n t a 


Note that the independence of the components t and is not assumed here. 


Chapter 12 
Random Walks and Factorisation Identities 


Abstract In this chapter, several remarkable and rather useful relations establishing 
interconnections between different characteristics of random walks (the so-called 
boundary functionals) are derived, and the arising problems are related to the sim- 
plest boundary problems of Complex Analysis. Section 12.1 introduces the concept 
of factorisation identity and derives two fundamental identities of that kind. Some 
consequences of these identities, including the trichotomy theorem on the oscilla- 
tory behaviour of random walks and a one-sided version of the Strong Law of Large 
Numbers are presented in Sect. 12.2. Pollaczek—Spitzer’s identity and an identity 
for the global maximum of the random walk are derived in Sect. 12.3, followed 
by illustrating these results by examples from the ruin theory and the theory of 
queueing systems in Sect. 12.4. Sections 12.5 and 12.6 are devoted to studying the 
cases where factorisation components can be obtained in explicit form and so closed 
form expressions are available for the distributions of a number of important bound- 
ary functionals. Sections 12.7 and 12.8 employ factorisation identities to derive the 
asymptotic properties of the distribution of the excess of a random walk of a high 
level and that of the global maximum of the walk, and also to analyse the distribution 
of the first passage time. 


In the present chapter we derive several remarkable and rather useful relations es- 
tablishing interconnections between different characteristics of random walks (the 
so-called boundary functionals) and also relate the arising problems with the sim- 
plest boundary problems of complex analysis. 


12.1 Factorisation Identities 


12.1.1 Factorisation 


On the plane of a complex variable 2, denote by /7 the real axis Ima = 0 and by 7+ 
(IT_) the half-plane Im. > 0 (Ima < 0). Let f(A) be a continuous function defined 
on JT. 
Definition 12.1.1 If there exists a representation 

fA) =f,@f_-@), A€T, (12.1.1) 


A.A. Borovkov, Probability Theory, Universitext, 333 
DOI 10.1007/978-1-4471-5201-9_12, © Springer-Verlag London 2013 


334 12 Random Walks and Factorisation Identities 


where f+ are analytic in the domains /7 and continuous on /7 U /7, then we will 
say that the function f allows factorisation. The functions f+ are called factorisation 
components (positive and negative, respectively). 

Further, denote by K the class of functions f defined on /7 that are continuous 
and such that 


sup |f(A)| < 00, inf |f(A)| > 0. (12.1.2) 
Aer reTT 
Similarly we define the classes K+ of functions analytic in /74 and continuous 
on [7+ U/T, such that 


sup liz @| < 00, inf, |f20)| >0. (12.1.3) 


Definition 12.1.2 If, for an f € K, there exists a representation (12.1.1), where 
f+ € Kx, then we will say that the function f allows canonical factorisation. 
Representations of the form 


f+) fo 
f-@) ” 


where fo = const and f+ € K+, are also called canonical factorisations. 


FA) = f+ AF-@) fo, fa) = Ae TT, 


Lemma 12.1.1 The components f+ of a canonical factorisation of a function f € K 
are defined uniquely up to a constant factor. 


Proof Together with the canonical factorisation (12.1.1), let there exist another 
canonical factorisation 


fA) =94+A)g-A), AE TI. 
Then 
Fr AFA) =9g4+A)g-@), AETL, 
and, by (12.1.2), we can divide both sides of the inequality by g+(A)f_(). We get 


fe) — g-() 

g+(A)  f_@)’ 
where, by virtue of (12.1.2), the function Ein ( 4) belongs to the class K+. 
(K_). We have obtained that the function fae analytical in J7,, can be analyti- 


cally continued over the line /7 onto the half-plane /7_ (to the function ae ). After 
such a continuation, in view of (12.1.3), this function remains bounded on the whole 
complex plane. By Liouville’s theorem, bounded entire functions must be constant, 
i.e. there exists a constant c, such that, on the whole plane 
A) g-@) _ 
9+)  f-@) 


holds, so f4(A) = cg+(), f-Q) = col g_ (A). The lemma is proved. 


’ 


12.1 Factorisation Identities 335 


The factorisation problem consists in finding conditions under which a given 
function f admits a factorisation, and in finding the components of the factorisation. 
This problem has a number of important applications to solving integral equations 
and is a version of the well-known Cauchy—Riemann boundary-value problem in 
complex function theory. We will see later that factorisation is also an important 
tool for studying the so-called boundary problems in probability theory. 


12.1.2 The Canonical Factorisation of the Function 
f£.Q) =1—zgQ) 


Let (2,8, P) be a probability space on which a sequence {&}?° , of indepen- 


dent identically distributed (&, cs €) random variables is given. Put, as before, 
Syi= ae, & and So = 0. The sequence {Sk} forms a random walk. 
First of all, note that the function 
fA):=1—z2pQ), gA):=Ee*, AT, 


belongs to K, for all z with |z| < 1 (here z is a complex-valued parameter). This 
follows from the inequalities |g(A)| < 1 for A € IT and |zp(A)| < |z| < 1. 


Theorem 12.1.1 (The first factorisation identity) For |z| < 1, the function f,(A) 
admits the canonical factorisation 


fA) = fer AC@-A), AE, (12.1.4) 
where 
a= of d —E(el*; Sk > a} OC. 
lee) k : 
f-0) e9| PHL Sk <0| eK_, (12.1.5) 
lee) zk 
c= ep] -Y Fm =o) 


Proof Since |z| < 1, Indi — zg(A)) exists, understood in the principal value sense. 
The following equalities give the desired decomposition: 


__ pIn(l—z(®) _ so hha) | _ — Les, 
f,A) =e = exp -) : = exp a a e 
k=1 k=1 
= oof » Ee Sp > 0} eof - » PK o} 


k 


[o,@) 
x oof SoBe e <0). 


k=1 


336 12 Random Walks and Factorisation Identities 


Show that f,,(A) € K4. Indeed, the function E(e!*S; §, > 0), for every k andi € 
IT, UIT, does not exceed | in the absolute value, is analytic in /7, and is continuous 
on IT, UIT. Analyticity follows from the differentiability of this function at any 
point A € IT; (see also Property 6 of ch.f.s in Sect. 7.1). The function Infz+(A) is 
a uniformly converging series of functions analytic in J7,, and hence possesses the 
same properties together with the function f,, (A). The same can be said about the 
continuity on /7 U IT. 

That f,_ (A) € K_ is established in a similar way. The theorem is proved. 


12.1.3 The Second Factorisation Identity 


The second factorisation identity is associated with the so-called boundary function- 
als of the random walk {5;}. On the main probability space (2, ¥, P) we define, 
together with {&;}, the random variable 


no :=min{k > 1; Sy > 0}. 


This is the first-passage time to zero level. For the elementary events such that all 
Sx <0, k => 1, we put nf. := oo. Like the random variable (0) in Sect. 10.1, the 
variable nf. is a Markov time. 

The random variable an = So is called the first nonnegative sum. It is defined 


on the set {n° < 00} only. 
The first passing time of zero from the right 


n° :=min{k > 1; Sx <0} 
possesses quite similar properties, and so does the first nonpositive sum x2. := 5,0 . 
Studying the properties of the introduced random variables, which are called 
boundary functionals of the random walk {S,}, is of significant independent interest. 
For instance, the variable no. is a stopping time, and understanding its nature is 
essential for studying stopping times in many more complex problems (see e.g. 
the problems of the renewal theory in Chap. 10, the problems of statistical control 
described in Sect. 4.4 and so on). Moreover, the variables AL and x? will be needed 
to describe the extrema 


€ :=sup(S1, $2,...) and y :=inf(Sj, So,...), 


which are also termed boundary functionals and play an important role in the prob- 
lems of mathematical statistics, queueing theory (see Sect. 12.4), etc. 
Put, as before, g(A.) := ge (A) = Eels. 


Theorem 12.1.2 (The second factorisation identity) For the ch,f. of the joint distri- 
butions of the introduced random variables, for |z| < 1 and Imd = 0, the canonical 
factorisation 


12.1 Factorisation Identities 337 


f.A) = 1— zg) 
=[1- (e424: ni. < oo) |]D7' (2) [1 - E(ei4x- 27": n° <o)], 
of §z(A) holds true, where 
D(z) :=1-E(z"; x9 =0, 19. <0) =1—E(z"; x° =0, n°. < 00). 
Proof Set €; := max{Sj,..., Sn}. We have 


gy" (A) = Fei* Sn = De Cae =k) + E(e iXSn. oes < 0) 


= SIE(c* iH gH5EI =k))+Mn, (12.1.6) 
k=1 
where M,, = E(ei**; fn <0) and I(A) is the indicator of the event A. For each 
fixed k, the random variables S,, — Sx and Sen. =k)= x19. = k) are indepen- 
dent. Hence, 


n 
+. 0 
gd) = Dig" Kaye (e%+; nf =k) + Mn. 
k=1 
Now multiply both sides by z”,n =0,1,..., and then sum up over n. We will use 
the convention that, for n = 0, 


n 
Mat. ISk 
k=1 


For the convolution of two sequences cy, = ya agby_—K, we have 


CO (oe) [oe 
SS ae > Anz" ee bnz", 
n=0 n= n=0 


provided that the series in this equality converges absolutely. Since |z| < 1 and 
|p(A)| < 1 for Ima = 0, one has 


De'y'a)= i205 sae Le mkayE (eet; nf =k) + 2"Mh 
n=0 n=0 


[o,@) [o,@) [o,@) 
= AE, “Hee zg" (a) + Do 2"Mn 
k=1 n=0 n=0 
1 “40 0 ie 
= Toro Ble 2" mh < 00) +) [2"Mn, 
40 n=0 


or, which is the same, 
1 —E(el*X+-2%-; 9 <0) _ 440) 
enw "ECE"; fy, <0) az)’ 


where a,+(A) denote the numerator and denominator of the ratio obtained for fz (A). 


LA=1-zeA= (12.1.7) 


338 12 Random Walks and Factorisation Identities 


It is easy to see that, if we put 
Yn = min(S),..., Sy) 
then, repeating the above arguments, we will arrive at the equality 
1 E(e x 2s 9 < 00) b_(A) 
Yer Z7E(e!4S; yy > 0) beet) 


where, similarly to the above, b,~(A), respectively, denote the numerator and de- 
nominator in relation (12.1.8). 

Now we show that a,4(A) € K and 6,4(A) € K for |z| < 1. Indeed, for |z| < 1 
and Ima = 0, 


fe (A) = 


(12.1.8) 


1) Claeraae nf. <00)| < E([z|"": nf. <0o) <1 


and therefore 


sup az+(A)| < OO, inf |az4(A)| > 0. 
ae AelT 


Since f,(A) € K, this also implies that a,_ (A) € K. In the same way we obtain that 
b,~(A) € K. By equating the right-hand sides of (12.1.7) and (12.1.8) and multiply- 
ing them by a,_(A)b6-4(A), we get 


a4 (A)bz4 (A) =a,_(A)b,_(A), AE TL. (12.1.9) 


Further, the functions a,,(A) and b,,(A) are bounded and analytic in 7, for the 
same reasons as the function f,,(A) (see the proof of Theorem 12.1.1). Similarly, 
a,—(A) and b,_(A) are bounded and analytic in /7_. We obtain that the function 
a4 (A)b,4 (A) is bounded and analytic in [7 and, by (12.1.9), has an entire bounded 
analytic continuation over the boundary /7 to the whole complex plane. This means 
that this function necessarily equals a constant c, and 6,,(A) = eos A) eK, 
az_(A) = cb! (A) € K_, so relations (12.1.7) and (12.1.8) deliver a canonical fac- 
torisation of f,(A). 
Further, e’“* — 0 as Im 4 > —oo, x < 0, and therefore 


b,(—i00) =1—E(z”"; x° =0, n° <00), — az_(-i00) =1, 
a,_(A)bz—(A) = az_(—i00)b,_ (—io0) = 1 — Ez; x° =0, n° < 00) = Div). 


Substituting into (12.1.7) the value az_ (A) = D(z)/b6;z— (A) derived from this equal- 
ity, we obtain the assertion of the theorem. The second relation for D(z) follows 
from the equality D(z) = a,4(i0o)b6,4 (i00). The theorem is proved. 


In the proof of Theorem 12.1.2 we used, in formula (12.1.6), a decomposition of 
Ee!**» into summands corresponding to the disjoint events 


{Ure =a] = 20 and {fn <0}. 


12.1  Factorisation Identities 339 


But the scheme of the proof will still work if we consider the partition of {2 into the 
events {¢, > O} and {¢,, < 0}. In order to do this, we introduce the random variables 


n+ i= min{k: S,_ > 0} 


(n+ = 00 if ¢ < 0; note that n = (0) in the notation of Sect. 10.1), 


X4 = Sy, 
n—:=min{k: S$, <0} (7vW- =ooif y > 0), 
X= Sy_- 

The variable n+ (j_) is called the time of the first positive (negative) sum x4. 
(x_). Now we can write, together with equalities (12.1.7) and (12.1.8), the relations 
1 — E(e!4%+2"+; ny < 00) 
ey Ee; &, <0) 

1 — E(e!4X-z"-; n_ < 00) 
= - : 12.1.10 
Doro ZPE (ei Sn; Yn = 9) ( ) 
Combining these relations with (12.1.7) and (12.1.8), we will use below the same 
argument as above to prove the following assertion. 


f.Q)=1—z9Q@)= 


Theorem 12.1.3 

1 E (eX 2%; 99 < 00) = D()[1—E (ez; ny <00)], (12.1.11) 
1_-E (eix2 27 no Z oo) = D(z)[1 —E (e!*X-z7-; Hn < oo) ]. - 
Here the function D(z) defined in Theorem 12.1.2 also satisfies the relations 


D“'(z) = YP: =0, & <0)= y RG, =0,% > 0). (12.1.12) 


n=0 n=0 


Clearly, from Theorem 12.1.3 one can obtain some other versions of the factori- 
sation identity. For instance, one has 


f(A) = [1 — (e+ 2%; ny <0) ][1 —E (e227; 9° <oo)].  (12.1.13) 
Representations (12.1.12) for D(z) imply, in particular, that 
PCS, = 0, Sn < 0) = P(S; =0, Yn = 9) 


and that D(z) = 1 if P(S,, = 0) = 0 for all n > 1. 
Proof of Theorem 12.1.3 Let us derive the first relation in (12.1.11). Comparing 
(12.1.8) with (12.1.10) we find, as above, that 

[1 —E(e%+z"; ny <00)]b,4(A) = const = 1, (12.1.14) 


since the product equals 1 for 1 = ioo. Therefore we obtain (12.1.13) by virtue of 
(12.1.8). It remains to compare (12.1.13) with the identity of Theorem 12.1.2. 


340 12 Random Walks and Factorisation Identities 


Expressions (12.1.12) for D(z) follow if we recall (see (12.1.8) and (12.1.10)) 
that the left-hand side of (12.1.14) equals 


[o,@) 
[Soe (ei*Sn, ty < 0) [1 _E (cidxe on no < oo) ]. 
n=0 

Since this product also equals 1, letting 4 = —ioo here and in the second identity 
of (12.1.11) we get the first equality in (12.1.12). The second equality is proved in 
a similar way. 


Remark 12.1.1 Jt is important to note that Theorems 12.1.2 and 12.1.3, as well as 
proving the existence of the identities, also provide a means of finding the charac- 
teristic function of the joint distribution of x and 7. That is, if we manage some- 
how to get a representation for f,(A) = 1 — z(A) of the form §,4 (A)b,- (A), where 
h-4(A) € Kx, then by uniqueness of the canonical factorisation we can, for instance, 
claim that, up to a constant factor, the function 1 — E(ei*X+ 27+; n+) coincides 
with §,4(A). For examples of how such arguments can be used, see Sects. 12.5 
and 12.6. 


12.2 Some Consequences of Theorems 12.1.1-12.1.3 


12.2.1 Direct Consequences 


Theorems 12.1.1—12.1.3 (and also their modifications of the form (12.1.13)) and 
the uniqueness of the canonical factorisation (see Lemma 12.1.1) directly imply the 
next result. 


Corollary 12.2.1 In the notation of Theorems 12.1.1 and 12.1.2 one has the follow- 
ing equalities. 
1- E(ei*X+ z+; n+ < oo) = fr+ A); 
D(z) =C(); 
1—E(e!*X-z"-; n_ <00) =f;-(). 


Now we will obtain, as corollaries of Theorems 12.1.1—12.1.3, some further iden- 
tities in which the parameter z is fixed and equal to 1. 


Corollary 12.2.2 Letting z > 1 in (12.1.13) we obtain 
fi) :=1-90) =[1—E(e*+; ny < 00) ][1 —E(e?*; n° <o0)]. (12.2.1) 
It is obvious that one can similarly write other identities of such type correspond- 
ing to the identities that can be derived from Theorems 12.1.1—12.1.3. 


Clearly, identity (12.2.1) delivers a factorisation of the function f;(A) = 1— g(A), 
but this factorisation is not canonical since f; (0) = 0 and f; (A) ¢ XK. 


12.2 Some Consequences of Theorems 12.1.1—12.1.3 341 


Corollary 12.2.3 /f there exists EE =a <0 then P(n® < 00) = 1, Ex® exists, and 
P(¢ <0) =a/Ex® >0. 
Proof The first relation follows from the law of large numbers, because 
P(n°. > n) <P(S, > 0) > 0 
as n — oo. Therefore, in the case under consideration, one has 
E(e**; no < oo) = Bele, 


The existence of Ex? follows from Wald’s identity Ey? = aEn® and the theorems 
of Chap. 10, which imply that En? < En_ < 00, since En_ is the value of the 
corresponding renewal function at 0. 

Finally, dividing both sides of the identity in Corollary 12.2.2 by A and taking 
the limit as A — O, we obtain 


a =(1—P(n4 <00))Ex? =P < 0)Ex?. 


It is interesting to note that, as a consequence of this assertion, we can obtain the 
strong law of large numbers. Indeed, since {f < oo} is a tail event and P(¢ < co) > 
P(¢ <0), Corollary 12.2.3 implies that P(¢ <oo) = 1 for a < 0. This means that the 
assertion of Theorem 10.5.3 holds, and it was this assertion that the strong law of 
large numbers was derived from. 

Based on factorisation identities, we will obtain below a generalisation of this 
law. 

In the remaining part of this chapter, to avoid trivial complications, we will be 
assuming that € takes, with positive probability, both positive and negative values. 


Corollary 12.2.4 [fa = EKé = 0 then P(n4 <oo)= P(n® <0o) = 1, so that 


1— g(a) = (1 — Ee**+)(1 — Ee), (1299) 
If, moreover, RE? = 0* < 00 then there exist Ex and Ex°, and 
0 ca 
Ex,Ex_ = as. 


Proof Consider the sequence & = & — &, ¢ > 0. Denoting by e X° and @ the cor- 
responding characteristics for the newly introduced sequence, we obtain by Corol- 
lary 12.2.3 that 


E 


~ a 
P 0) <P 0)=— TS =-p: 
(¢ <0) <P <0) EX? a 


where 


EX® <E(é; & <0) =E(é —¢; & <¢) < E(é; & <0) <0. 


342 12 Random Walks and Factorisation Identities 


So we can make the probability P(¢ < 0) arbitrarily small by choosing an appropri- 
ate ¢, and thus P(¢ < 0) = P(n+ = oo) = 0. Similarly, we find that P(y > 0) = 0 
and hence 


P(n° = 00) < P(n_ = 00) =P(y > 0) =0. 


The obtained relations and Corollary 12.2.2 yield identity (12.2.2). 

In order to prove the second assertion of the corollary, divide both sides of iden- 
tity (12.2.2) by 42 = —(id)? and let 4 € IT tend to zero. Then the limit of the left- 
hand side will be equal to o? /2 (see (7.1.1)), whereas that of the right-hand side will 
be equal to —Ex,Ex®, where Ex+ > 0, |Ex°| > 0. The corollary is proved. 


Corollary 12.2.5 


1. We always have }~ PO:=0) <0O. 
2. The following three conditions are equivalent: 


(a) P(E <co)=1; 
(b) P(E <0) =P(n+ = 00) > 0; 
(6) POS:>0) £ooor >) 4 zie) <0. 


Proof To obtain the first assertion, one should let z — 1 in the second equality in 
Corollary 12.2.1 and recall that 


D(1) =1—P(x? =0, n§, < 00) > P(E > 0) > 0. 


The equivalence of (b) and (c) follows from the equality 


[ee 


1—P(ny <00) =P <0) =emp{- ZO) 
k=1 


which is derived by putting 4 = 0 and letting z — | in the first identity of Corol- 
lary 12.2.1. 

Now we will establish the equivalence of (b) and (c). If P(¢ <0) > 0 then P(g < 
oo) > 0 and hence P(¢ < 00) = 1, since {f < oo} is a tail event. Conversely, let ¢ 
be a proper random variable. Choose an N such that P(¢ < N) > 0, and b > 0 such 
that k = N/b is an integer and P(é < —b) > 0. Then 


{¢ <O}D {é = =D i9 355 be =), Sup(—bk> See +e Sets) <0}. 
j2l 


Since the sequence & +1, +2, ... 1s distributed identically to &), &2,..., one has 


P(c <0) > [PE <—b)]‘P(C < bk) > 0. 


Corollary 12.2.6 


1. PO <w,y > —o0) = 0. 


12.2 Some Consequences of Theorems 12.1.1—12.1.3 343 


2. If there exists EE =a <0 then 


P(n74 < 00) < 1, P(E <~w,y =—-0o0) = 1, 


( > = 0) oo, > ace 0) _ ~) 


k=1 kal 
3. If there exists EE = a =0 then 


P(E =00, y =—00) =I, 


© PS; > 0) © P(S; <0) 
——"——* =o, ———— =o}. 
(52 PZ ace, SP 


k=1 
Here we do not consider the case a > 0 since it is “symmetric” to the case a < 0. 


Proof The first assertion follows from the fact that at least one of the two series 
‘aa te and 7, a diverges. Therefore, by Corollary 12.2.5 either 
P(y = —o0) = 1 or P(€ =~) = 1. 
The second and third assertions follow from Corollaries 12.2.3-12.2.5 in an ob- 
vious way. 


12.2.2 A Generalisation of the Strong Law of Large Numbers 


The above mentioned generalisation of the strong law of large numbers consists of 
the following. 


Theorem 12.2.1 (The one-sided law of large numbers) Convergence of the series 


3 P(S, > ek) 
k 
k=1 


for every ¢ > 0 is a necessary and sufficient condition for 


S 
P( timsup —“< 0) =1. (12.2.3) 


n>oo Nl 
Proof Sufficiency. If the series converges then by Corollary 12.2.5 we have 
P(supt Si, —Eekh< 00) = 1. 
k 


Hence {en} is an upper sequence for {S,} and 


S 
P( timsup + = :) =1. 


k-0o 


344 12 Random Walks and Factorisation Identities 


But since ¢ is arbitrary, we see that 


S 

P( tim sup ae 0) =1. 
k— 00 k 

Necessity. Conversely, if equality (12.2.3) holds then, for any ¢ > 0, with proba- 

bility 1 we have S,/n < e« for all n large enough. This means. that 

sup; (Sx; — ek) < oo with probability 1, and hence by Corollary 12.2.5 the series 


hel ea converges. The theorem is proved. 


Corollary 12.2.7 With probability 1 we have 
. Sn 
lim sup — =a, 
n>oo NN 


where 


a= int: ye «ool, 


Proof For any b > a, the series in the definition of the number a@ converges. Since 
{lim sup, Sn/n < b} is a tail event and S! = S, — bn again form a sequence 
of sums of independent identically distributed random variables, Theorem 12.2.1 
immediately implies that 


S 
P(imsup —< ») =1, 
n 
S - S 1 
P(imsup ei «) a r( () {timsup a <at+ :}) =1. 
n n k 
k=1 
If we assume that P(limsup S,/n < a*) = 1 for a* <q then, for &* = & — a* and 
Sf = a 55 we will have lim sup Sn <0, and 


[ee 


P(S; > (a* +8)k) 
> k < CO 


k=] 


for any ¢ > 0, which contradicts the definition of a. The corollary is proved. 


In order to derive the conventional law of large numbers from Theorem 12.2.1 it 
suffices to use Corollary 12.2.7 and assertion 2 of Corollary 12.2.6. We obtain that in 
the case EE = 0 the value of a in Corollary 12.2.7 is 0 and hence limsup S,/n =0 
with probability 1. One can establish in the same way that liminf S,/n = 0. 


12.3 Pollaczek-Spitzer’s Identity. An Identity for S = sup;.o Sx 


It is important to note that, besides Theorems 12.1.1 and 12.1.2, there exist a number 
of factorisation identities that give explicit representations (in terms of factorisation 


12.3 Pollaczek—Spitzer’s Identity. An Identity for S = supzs9 Sx 345 


components) for ch.f.s of the so-called boundary functionals of the trajectory of 
the random walk {S,}, i.e. functionals associated with the crossing by the trajectory 
of {S;,} of certain levels (not just the zero level, as in Theorems 12.1.1—12.1.3). The 
functionals 


Sy = max Sy, On = min{k : S; = Sp} 
k<n 


and some others are also among the boundary functionals. For instance, for the triple 
transform of the joint distribution of (S,, 6,), the following representation is valid. 
For |z| < 1, |o| < 1/|z| and Im > 0, one has 


_ 7 a On iAS, = f.+(0) ; 
( spe (pe er 


(For more detail on factorisation identities, see [3].) 
Among many consequences of this identity we will highlight two results that can 
also be established using the already available Theorems 12.1.1—12.1.3. 


12.3.1 Pollaczek—Spitzer’s Identity 


So far we have obtained several factorisation identities as relations for numerators 
in representations (12.1.7), (12.1.8) and (12.1.9). Now we turn to the denomina- 
tors. We will obtain one more identity playing an important role in studying the 
distributions of 

Sy, = max(0, fn) = max(0, S},..., Sn). 


This is the so-called Pollaczek—Spitzer identity relating the ch.f.s of S,,n=1,2,..., 
with those of max(0, S,),n=1,2,.... 


Theorem 12.3.1 For |z| < 1 andImid > 0, 


ioe) _ oe) zk 
n IASn __ id max(0, Sz) 
) z Ke = ex ) —KEe : 


n=0 n=0 
Using the notation of Theorem 12.1.1, one could write the right-hand side of this 
identity as 
fz+ ©) 
C1 — Z)fz+@) 
(see the last relation in the proof of the theorem). 


Proof Theorems 12.1.1—12.1.3 (as well as their modifications of the form (12.1.13)) 
and the uniqueness of the canonical factorisation imply that 


[o,e) 
YS z"E(e!*; f, <0) = [1 -E(el*%-z"-; n_ <00)] =f), 
k=0 


346 12 Random Walks and Factorisation Identities 
where we assume that E(e!*%0; fo < 0) = 1, so all the functions in the above relation 
turn into 1 at A = —iow. Set 

Sh = Suk — Sus 6 := min{k : Sf = ce = max(0, S7,+.+5-5,)} 


(6% is time of the first maximum in the sequence 0, S¥,..., 5). Then the event 
{Sp € dx, <0} can be rewritten as {S* € —dx, 0* =n}. This implies that 


E(e**"; ¢, <0) =E(e?*"; 0* =n), 


lo.) 
Yo 2"E(e (cS; o* =n) =f !(-a). 


n=0 


(12.3.1) 


But the sequence Sf,..., 57 is distributed identically to the sequence of sums 


&N EF + &F,...,8f +--+ &*, where &* = —&. If we put 6, := min{k : Se = Sy} 
then identity (12.3.1) can be equivalently rewritten as 


Yo e"E(e*; Oy =n) = (F-2)) | 


n=0 


where f?_ (A) is the negative factorisation component of the function 1 — zy*(A) = 
1 — zy(—A) corresponding to the random variable —&. Since 


1 — zp(—A) = fut (AVC @)fz-(—A) 


and the function f,4(—A) possesses all the properties of the negative component 
f£_(A) of the factorisation of 1 — zy*(A), while the function f,_(—A) has all 
the properties of a positive component, we see that f?_(A) = f,4(—A) and 


CO 


; 1 
oe z"E (eS, 6, = n) _ in Ow . 


n=0 


Now we note that 


Feit Sn — = De (ci*5n, >On =k) 


= rb: On =k, Se4i — Sk <0,...,Sn — Se <0) 
k=0 


n 
=) Be G2 )P(S-2 = 0). 
Since the right-hand “_ is the ee of two = we obtain that 


Si etBe* a y2'PG, = = rans) >" 


n=0 n=0 
Putting A = 0 we get 


as P(S, =0) i a 


12.3 Pollaczek—Spitzer’s Identity. An Identity for S = sup;>9 Sx 347 


Therefore, 


= nF eiASn — f4.(0) 
= © =a @ 


oO ok OO Lk 
=eso| 1 —zt+ a —E(el**; ss 0) _ > — PCS > o} 
k=1 k=1 


oo zk ; i) zk 
= exp > = E(e!***; S¢ > 0) + > 7 PS <0) 
k=l 


k=1 
= oo 


The theorem is proved. 


Me 
| 


Ke re) ' 


> 
Il 


1 


12.3.2 An Identity for S = sup;>9 Sx 


The second useful identity to be discussed in this subsection is associated with the 
distribution of the random variable S = sup;. Sz; = max(0, ¢) (of course, we deal 
here with the cases when P(S < oo) = 1). This distribution is of interest in many 
applications. Two such illustrative applications will be discussed in the next subsec- 
tion. 

We will establish the relationship of the distribution of S with that of the vector 
(x+, 7+) and with the factorisation components of the function 1 — zg(A). 

First of all, note that the random variable n+ is a Markov time. For such variables, 
one can easily see (cf. Lemma 10.2.1) that the sequence & = &), +41, & = &y,42,.-- 
on the set {w: n+ < co} (or given that n+ < oo) is distributed identically to 
&,,&,... and does not depend on (7+, &,..., &,). Indeed, 


P(Ef € By... & € Be | ng = 7, €1 €Al,-.-,&, € Any) 
=P(Ej41 € Bi... €j4e © Be | 1 E Ar,...,€) CAjs ne =f) 
=P(& € By,...,& € Bx). 


Considering the new sequence {&*}?° , we note that it will exceed level 0 (the 
level x, for the original sequence) with probability p = P(n < oo), and that the 
distribution of ¢* = sup,,,(&/ +--- + &) coincides with the distribution of ¢ = 
supzs 1 Sk. 

Thus, with S* := max(0, ¢*), we have 
on {wo : n= 0}, 


0 
S=S = 
() a on {w: n+ < oo}. 


Since, as has already been noted, S* does not depend on y+ and 7+, and the distri- 
bution of S* coincides with that of S, we have 


348 12 Random Walks and Factorisation Identities 


Fe!*5 — P(n4 = 00) + E(e GG +"); na < oo) 
=(1-—p)+ Ee SE(e*%*; nN < co). 
This implies the following result. 
Theorem 12.3.2 If >> FS;>0) < 00 or, which is the same, p = P(n+ < o&) <1, 
then 


Ee*S = l—p _ L=p 
1—E(e*x+, 4 <00) fig (A) 


In exactly the same way we can obtain the relation 


; 1— 
Ee!*5 = eee (12.3.2) 
1 — Ee!4%+, 1? < 00) 
where po = P(n? <oo)<l. 
In this case, one can write a factorisation identity in following form: 
1— po)(1—Ee*-) (1 = p)( — Ee** 
i=—ors (1 — po)( ee), aan os) (12.3.3) 


Fei4S Feiss 
In Sects. 12.5-12.7 we will discuss the possibility of finding the explicit form 
and the asymptotic properties of the distribution of S. 


12.4 The Distribution of S in Insurance Problems and Queueing 
Theory 


In this section we show that the need to analyse the distribution of the variable S 
considered in Sect. 12.3 arises in insurance problems and also when studying queue- 
ing systems. 


12.4.1 Random Walks in Risk Theory 


Consider the following simplified model of an insurance business operation. De- 
note by x the initial surplus of the company and consider the daily dynamics of 
the surplus. During the k-th day the company receives insurance premiums at the 
rate et = 0 and pays out claims made by insured persons at the rate & > 0 (in 
case of a fire, a traffic accident, and so on). The amounts & = & — et are ran- 
dom since they depend on the number of newly insured persons, the size of pre- 
miums, claim amounts and so on. For a foreseeable “homogeneous” time period, 
the amount &% can be assumed to be independent and identically distributed. If we 
put S, := >-;_, & then the company’s surplus after n days will be Z, =x — Sy, 
provided that we allow it to be negative. But if we assume that the company ruins at 


12.4 The Distribution of S in Insurance Problems and Queueing Theory 349 


the time when Z,, first becomes negative, then the probability of no ruin during the 
first n days equals 


P( min Ze = 0) = P(S;, s x), 
k<n 


where, as above, S;, = max;<y, S;. Accordingly, the probability of ruin within n days 
is equal to P(S,, > x), and the probability of ruin in the long run can be identified 
with P(S > x). It follows that, for the probability of ruin to be less than 1, it is nec- 
essary that Eg; <0 or, which is the same, that E&, < Eé, . When this condition is 
satisfied, in order to make the probability of ruin small enough, one has to make the 
initial surplus x large enough. In this connection it is of interest to find the explicit 
form of the distribution of S, or at least the asymptotic behaviour of P(S > x) as 
x — oo. Sections 12.5-12.7 will be focused on this. 


12.4.2 Queueing Systems 


Imagine that “customers” who are to be served by a certain system arrive with time 
intervals Tt], T2,... between successive arrivals. These could be phone calls, planes 
landing at an airport, clients in a shop, messages to be processed by a computer, 
etc. Assume that serving the k-th customer (the first customer arrived at time 0, the 
second at time t,, and so on) requires time s;, k = 1, 2,... If, at the time of the k- 
th customer’s arrival, the system was busy serving one of the preceding customers, 
the newly arrived customer joins the “queue” and waits for service which starts 
immediately after the system has finished serving all the preceding customers. The 
problem is to find the distribution of the waiting time w, of the n-th customer—the 
time spent waiting for the service. 

Let us find out how the quantities w,4, and w, are related to each other. The 
(n + 1)-th customer arrived t, time units after the n-th customer, but will have to 
wait for an extra s, time units during the service of the n-th customer. Therefore, 


Wntl = Wn — Ta + Sn, 


only if Wy — tT + 5n > 0. If wa — tT + Sn <0 then clearly wy) = 0. Thus, if we 
put En+1 >= Sn — Thy then 


Wn+1 = max(0,Wp+én41), n=l, (12.4.1) 


with the initial value of w; > 0. Let us find the solution to this recurrence equa- 
tion. Let, as above, S, = = 1 &x- Denote by @(n) the time when the trajectory of 
0, S),..., S, first attains its minimum: 


O(n) := min{k: S,=S,}, S,:= min §j. 


Then clearly (for wo := w1) 


Untt=witSn if wen) =witS, >0 (12.4.2) 


350 12 Random Walks and Factorisation Identities 


(since in this case the right-hand side of (12.4.1) does not vanish and wz4) = we +&; 
for all k <n), and 


Wnt =Sn—Son) ifwi tS, <0 (12.4.3) 


=n — 


(wen) = 0 and wy41 = we + & for all k > O(n)). Put 


n 
Sn, j= > Ek, Snjn = max Sp j, 
: O<j<n 
k=n—j+1 


so that 
Sno = 9, Snjn = Sn. 

Then 
Sn — Son) = Sn — S 


S, = Max (Sn — Sj) = Snins 
O<j<n 


so thatw, + S, = wy + Sn — Sg and the inequality w; + Sn <0 in (12.4.3) is 


equivalent to the inequality Syn = Sp — Seqn) = wi + Sn. Therefore (12.4.2) and 
(12.4.3) can be rewritten as 


Wna+l = max(Si.n, w,+ S,). (12.4.4) 
This implies that, for each fixed x > 0, 
P(Wns1 > x) = PSin >x)+ PS <x, witSn>x). 


Now assume that &; = € are independent and identically distributed with Eé < 0. 


Then Sia La S, and, as n > ov, we have S, oon —oo, P(w; + S, > x) > O and 
P(S;, > x) + PCS > x). We conclude that, for any initial value w , the following 
limit exists 


lim P(w, > x) =P(S > x). 
n—->oco 


This distribution is called the stationary waiting time distribution. We already 
know that it will be proper if E§ = Es; — Et, < 0. As in the previous section, here 
arises the problem of finding the distribution of S. If, on the other hand, Es; > Er 
or Es; = Et; and s; ¥ tT) then the “stationary” waiting time will be infinite. 


12.4.3 Stochastic Models in Continuous Time 


In the theory of queueing systems and risk theory one can equally well employ 
stochastic models in continuous time, when, instead of random walks {S,,}, one uses 
generalised renewal processes Z(t) as described in Sect. 10.6. For a given sequence 
of independent identically distributed random vectors (t;, ¢;), the process Z(t) is 
defined by the equality 


Z(t) = Z(t), 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 351 


where 
n k 
Lig =k. v(t) := max{k: 7% <t}, TT, => te 
j=l j=l 


For instance, in risk theory, the capital inflow during time f that comes from 
regular premium payments can be described by the function gt, gq > 0. The insurer 
covers claims of sizes €), f2,... with time intervals tT), T2,... between them (the 
first claim is covered at time t)). Thus, if the initial surplus is x, then the surplus at 
time ¢ will be 


x+qt—Zya)=x+qt— Z(t). 
The insurer ruins if inf;(x + gt — Z(t)) <0 or, which is the same, 
sup(Z(t) - qt) > x. 
t 
It is not hard to see that 


sup(Zy) — gt) = sup S, =: S, 
t k>0 


where S, = yy €;, €; =¢j; — qt;. Thus the continuous-time version of the ruin 
problem for an insurance company also reduces to finding the distribution of the 
maximums of the cumulative sums. 


12.5 Cases Where Factorisation Components Can Be Found in 
an Explicit Form. The Non-lattice Case 


As was already noted, the boundary functionals of random walks that were consid- 
ered in Sects. 12.1—12.3 appear in many applied problems (see e.g., Sect. 12.4). This 
raises the question: in what cases can one find, in an explicit form, the factorisation 
components and hence the explicit form of the boundary functionals distributions 
we need? Here we will deal with factorisation of the function | — (A) and will be 
interested in the boundary functionals x+ and S. 


12.5.1 Preliminary Notes on the Uniqueness of Factorisation 


As was already mentioned, the factorisation of the function 1 — g(A) obtained in 
Corollaries 12.2.2 and 12.2.4 is not canonical since that function vanishes at A = 0. 
In this connection arises the question of whether a factorisation is unique. In other 
words, if, say, in the case Eé < 0, we obtained a factorisation 


1— 9Q)=f4 AJ-A), 


352 12 Random Walks and Factorisation Identities 


where f+ are analytic on /74 and continuous on /7, U/T, then under what conditions 
can we state that 


was _ +O) 

FQ) 

(cf. Theorem 12.3.2)? In order to answer this question, in contrast to the above, we 
will have to introduce here restrictions on the distribution of é. 


E 


1. We will assume that Eé exists, and in the case EE = 0 that Eé? also exists. 
2. Regarding the structure of the distribution of € we will assume that either 


(a) the distribution F is non-lattice and the Cramér condition on ch.f. holds: 


lim sup|g(A)| < 1, (195.1) 
|A|—0o 
Ima=0 


or 
(b) the distribution F is arithmetic. 


Condition (12.5.1) always holds once the distribution F has a nonzero absolutely 
continuous component. Indeed, if F = Fz + F; + Fy is the decomposition of F into 
the absolutely continuous, singular and discrete components then, by the Lebesgue 
theorem, fe Fa (dx) — Oas |A| > co on Im A = 0, and so 

lim sup|g(A)| < F;((—00, 00)) + Fa((—00, 00)) <1. 
|A|—> 00 

For lattice distributions concentrated at the points a + hk, k being an inte- 
ger, condition (12.5.1) is evidently not satisfied since, for 1 = 27 j/h, we have 
|p(A)| = |e’?74/"| = 1 for all integers j. The condition is also not met for any dis- 
crete distribution, since any “part” of such a distribution, concentrated on a finite 
number of points, can be approximated arbitrarily well by a lattice distribution. For 
singular distributions, condition (12.5.1) can yet be satisfied. 

Since, for non-lattice distributions, |g(A)| < 1 for A 4 0, under condition (12.5.1) 
one has 

sup |p(A)| <1 (12.5.2) 
|A|>e 
for any ¢ > 0. This means that the function f(A) = 1 — g(A) has no zeros on the real 
line [7 (completed by the points +00) except at the point A = 0. 

In case (b), when the distribution of F is arithmetic, one can consider the ch.f. 
g(A) on the segment [0, 277] only or, which is the same, consider the generating 
function p(z) = Ez‘, in which case we will be interested in the factorisation of the 
function 1 — p(z) on the unit circle |z| = 1. 

Under the aforementioned conditions, we can “tweak” the function | — g(A) so 
that it allows canonical factorisation. 

In this section we will confine ourselves to the non-lattice case. The arithmetic 
case will be considered in Sect. 12.6. 


Lemma 12.5.1 Let the distribution F be non-lattice and condition (12.5.1) hold. 
Then: 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 353 


1. IfE& <0 then the function 


vA) = we 1 OY iy ay (12.5.3) 


belongs to K and allows a unique ee, factorisation 
D(A) = v4 (A)v_ (A), 


where 


04(A):=1—E(e*+; ny <o0) = (12.5.4) 


_ Feixe 
b_(A) = ——— (iA + 0). (12.5.5) 
l 
2. IfEE = 0 and Eé? < oo then the ais 


— 9A) 
= 425 (4° a 1) 
belongs to K and allows a unique canonical factorisation 


p°(A) = vo (A)v? (a), 


p(A) i= (12.5.6) 


where 
ij 1 — Eei*x+ 
vf (A) = ——_(- ), 
Ll 
| pein? (12.5.7) 
vw (A) := — = (4 
Ll 


(cf. Corollaries 12.2.2 and 12.2.4). 


Here we do not consider the case E& > 0 since it is “symmetric” to the case 
Eé < 0 and the corresponding assertion can be derived from the assertion | of the 
lemma by applying it to the random variables —&, (or by changing 4 to —A in the 
identities), so that in the case E& > 0, the function 1g) (iA — 1) will allow a 
unique canonical factorisation. 

The uniqueness of the canonical factorisation immediately implies the following 
result. 


Corollary 12.5.1 Jf, for EE <0, we have a canonical factorisation 
v(A) = w+ (A)tv_ (A), 
then 
to, (0) 


Ee?S = AT (12.5.8) 


Proof of Lemma 12.5.1 Let EE < 0. Since 


1— (A) 


—E 0 
7] > E> 


354 12 Random Walks and Factorisation Identities 


as A — 0 and (12.5.1) is satisfied, we see that 0(A) is bounded and continuous on JT 
and is bounded away from zero. This means that v(A) € K. 
Further, by Corollary 12.2.2 (see (12.2.1)) 


_ pr iax®ye ; 
a= ne) [1-E(e***; ny <00)], 


where E ae € (—oo, 0). Therefore, similarly to the above, we find that 


1—Ee**)(GA +1 
ij a ie 
ir 
Furthermore, v_(A) € K_ (the factor iA + 1 has a zero at the point 4 =i € J1+). 
Evidently, we also have 


vi, ~A)=1 —E(e*%; n+ < oo) eKNKy. 


This proves the first assertion of the lemma. The last equality in (12.5.4) follows 
from Theorem 12.3.2. The uniqueness follows from Lemma 12.1.1. 

The second assertion is proved in a similar way using Corollary 12.2.5, which 
implies that Ex, € (0, 00), Ex? € (—oo, 0), and 


(A) = E — Bel*X%+) (ia — 2 — Bel y Ga + 1) 


, (2.5.9 
ia id ( 


where, as before, we can show that v°(A) € K and the factors on the right-hand side 
of (12.5.9) belong to KN Kx, correspondingly. The lemma is proved. 


12.5.2 Classes of Distributions on the Positive Half-Line with 
Rational Ch.F:s 


As we saw in Example 7.1.5, the ch.f. of the exponential distribution with density 
Be~?* on (0,00) is B/(B — id). The j-th power of this ch.f. ocrresponds to the 
gamma-distribution Ig, ; (the j-th convolution of the exponential distribution) with 
density (see Sect. 7.7) 


Bkxi—le—Px 
G—D! 


This means that a density of the form 


ae ee (12.5.10) 


on (0, co) (where all 6, > 0 are different) can then be considered as a mixture 
of gamma-distributions and its ch.f. will be a rational function Py, (A)/Qn(A), where 


, x>0. 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 355 


P,, and Q,, are polynomials of degrees m and n, respectively (for definiteness, we 
can put) 


K 
On(r) = | [Bx — ia)", (12.5.1) 


k=1 


and necessarily m <n (see Property 7.1.8) with n = Yo l,. Here all the zeros of 
the polynomial Q,, are real. But not only densities of the form (12.5.10) can have 
rational ch.f.s. Clearly, the Fourier transform of the function e P* cos yx, which can 
be rewritten as 


1 a : 
se le +e tY*), (12.5.12) 


will also be a rational function. Complex-valued functions of this kind will have 
poles that are symmetric with respect to the imaginary line (in our case, at the points 
A= —if + y). Convolutions of functions of the form (12.5.12) will have a more 
complex form but will not go beyond representation (12.5.10), where 6, are “sym- 
metric” complex numbers. Clearly, densities of the form (12.5.10), where fx are 
either real and positive or complex and symmetric, Re Bx > O, exhaust all the dis- 
tributions with rational ch.f.s (the coefficients of the “conjugate” complex-valued 
exponentials must coincide to avoid the presence of irremovable complex terms). 

It is obvious that the converse is also true: rational ch.f.s Pm(A)/Qn(A) corre- 
spond to densities of the form (12.5.10). In order to show this it suffices to decom- 
pose Py,(A)/Q,(A) into partial fractions, for which the inverse Fourier transforms 
are known. 

We will call densities of the form (12.5.10) on (0, 00) exponential polynomials 
with exponents £;. We will call the number /; the multiplicity of the exponent fb; 
— it corresponds to the multiplicity of the pole of the Fourier transform at the point 
A = —iB,x (recall that QO, (A) = Tes (PR -i )*). One can approximate an arbitrary 
distribution on (0, oo) by exponential polynomials (for more details, see [3]). 


12.5.3 Explicit Canonical Factorisation of the Function (A) in 
the Case when the Right Tail of the Distribution F Is an 
Exponential Polynomial 


Consider a distribution F on the whole real line (—co, oo) with Eé < 0 and such 
that, for x > 0, the distribution has a density that is an exponential polynomial 
(12.5.10). Denote by EP the class of all such distributions. The ch.f. of a distri- 
bution F € €P can be represented as 

gA)=97Q)+¢ (A), 
where the function 


yg (A) =E(e; <0), EEF, 


356 12 Random Walks and Factorisation Identities 


is analytic on JT_ and continuous on J7_ U I, and g* (A) is a rational function 
Pm(A) 
On(A)’ 


analytic on J7,. Here yt (A) is a ch-f. up to the factor P(E > 0) > 0. 
It is important to note that, for real ww, the equality 


gt (A) = m <n, (12.5.13) 


U (w) = 0" (ip) = Ele; & > 0) = (12.5.14) 
Qn(—ip) 
only makes sense for jz < £1, where /; is the minimal zero of the polynomial 
On(—ip) (i.e. the pole of w(A)). It is necessarily a simple and real root since 
the function yt (j) is real and monotonically increasing. Further, w*(j1) = oo 
for « > f,. Therefore the function E(es; € > 0) is undefined for Reid > fp, 
(Im < —f 1). However, the right-hand side of (12.5.14) (and hence yt (A)) can be 
analytically continued onto the lower half-plane Im 7 < —f; to a function defined 
on the whole complex plane. In what follows, when we will be discussing zeros of 
the function | — g(A) on IT_, we will mean zeros of this analytical continuation, 
i.e. of the function g (A) + Pn (A)/Qn(A). 
Further, note that, for distributions from the class €P, the Cramér condition 
(12.5.1) on ch.f.s always holds, since gt (A) > 0 as |A| > 00, and 
limsup |g(a)|= limsup |g_()| < PE <0) <1. 
|A| > 00, AElT |A| 00, AEIT 
For a distribution F € €?, the canonical factorisation of the functions v(A) and 
v9 (A) (see (12.5.3) and (12.5.6)) can be obtained in explicit form expressed in terms 
of the zeros of the function | — g(A). 


Theorem 12.5.1 Let there exist EE < 0. In order for the positive component to+(A) 
of a canonical factorisation 


v(A) = to+(A)to_ (A), t+ E€KLNK, 


to be a rational function, it is necessary and sufficient that the function 
y* (A) =E(e; & > 0) 
is rational. 


Tf g* = Pn /Qn is an uncancellable ratio of polynomials Pm and Qn of degrees 
m and n, respectively, m <n, then the function 1 — g(A) has precisely n zeros on 


II_ (we denote them by —if11,..., —iftn), and 
n . 
—1 (Uk — tA) 
w4(A) = Se (12.5.15) 
n 


where Qy(—ipg) £0 (i.e. ratio (12.5.15) is uncancellable). 
Tf all zeros —ijg are arranged in descending order of their imaginary parts: 


Re py < Rep <---< Ren, 


then the zero —ijt1 will be simple and purely imaginary, 4, < min(Re 122, A), 
where B, is the minimal zero of Qy(—i[L). 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 357 


The theorem implies that the component to_ (A) can also be found in an explicit 
form: 


(1= yA) GA+ DOn@) 
iA Tea (tx — i) 


From Corollary 12.5.1 we obtain the following assertion. 


tw_(A) = 


Corollary 12.5.2 [fE& <0 and g = Pm/Qm then 


FelS = t0+(0) = On(A) That #k 
m4(A) [ogee — iA) On(0) © 
By Theorem 12.5.1 and (12.3.3) we also have 
(1 — g(A)) Ona) — [par oe 
l—p [Ue y@e-id) On(0) ’ 
(1 = p) T]ay (4 — 14) On) 
On(A) kei Uk 


Eel*x* = | 
(12.5.16) 


E(e*%+; ny < 00) =1 


Proof of Theorem 12.5.1 The proof of sufficiency will be divided into several stages. 
1. In the vicinity of the point A = 0 on the line /7, the value of 
CU — g@))GA+ 1) 
ir 
lies in the vicinity of the point —E& > 0. By virtue of (12.5.2), outside a neighbour- 
hood of zero one has 


TW iA+1 qt W 
arg(1 — g(A)) ( 5). arg — ( 3), (12.5.17) 


vA) = 


where, for a complex number z = |z|e’”, arg z denotes the exponent y. In (12.5.17) 

arg z means the principal value of the argument from (—z, 7]. Clearly, arg z1z2 = 

arg zj + arg z2. This implies that, when A changes from —T to T for large T, the 

values of arg v(A) do not leave the interval (—z, 7) and do not come close to its 

boundaries. Moreover, the initial and final values of v(A) lie in the sector argz € 
14 


=75 ae This means that, for any 7, the following relation is valid for the index of 


the function b on [—T, T]: 
indy v Lf a v(A)) € a ae | (12.5.18) 
in ——— ar, —-~,-], <i. 5. 
Oe Nee 2°2 
(if the distribution F has a density on (—co,0] as well then g(+T) — O and 
indy vb > Oas T > &.) 
2. Represent the function b as the product v(A) = v1 (A)v2(A), where 


01(A) = Gaal 
On(A) (12.5.19) 
— QnA) = GA) — QnA) = PmA) = One” (A) 
02(A) = = ‘ 


iAGA + 127! iAGA + 1"! 


358 12 Random Walks and Factorisation Identities 


We show that 


1 
Jn + indy v2| < 5. (12.5.20) 


In order to do this, we first note that the function bv; is analytic on [7+ and has there 
a zero of multiplicity m at the point A = 7. Consider a closed contour T consist- 
ing of the segment [—7, 7] and the semicircle |A| = T lying in J7,. According to 
the argument principle in complex function theory, the number of zeros of the func- 
tion v1 inside J7 equals the increment of the argument of b; (A) divided by 27 when 
moving along the contour Ts in the positive direction, i.e. 


1 
=| dargvi(A) =n. 
20 Tp 


As, moreover, 0; (A) > (—1)” = const as |A| > oo (see (12.5.11) and (12.5.19)), 
we see that the increment of argv; on the semicircle tends to 0 as T — ov, and 
hence 


1 CO 
indy 0; —> indv; al dargvi(A) =n. 
21 Joo 


It remains to note that ind7 b = ind7 vb; + ind7 v2 and make use of (12.5.18). 

3. We show that | — g(A) has precisely n zeros in I7_. To this end, we first show 
that the function 2(A), which is analytic in J7_ and continuous on J7T_ U IT, has n 
zeros in IT_. Consider the positively oriented closed contour ‘J consisting of the 
segment [—T, 7] (traversed in the negative direction) and the lower half of the circle 
|A| = T, and compute 


al dargv2(A). (12.5.21) 
2Ia5 

Since v2(A) ~ (—1)"(1 — @ ()) (see (12.5.11) and (12.5.19)), |g (A)| < 1 as 
|A| + oo, Imad <0, for large T the part of integral (12.5.21) over the semicircle 
will be less than 1/2 in absolute value. Comparing this with (12.5.20) we obtain 
that integral (12.5.21), being an integer, is necessarily equal to n. This means that 
v2(A) has exactly n zeros in I7_, which we will denote by —if11,..., —iftn. Since 
Qn(—ipg) ~ 0 (otherwise we would have, by (12.5.19), Pn(—iug) = 0, which 
would mean cancellability of the fraction P»,/Q,), the function 1 — g(A) has in IT_ 
the same zeros as 02(A) (see (12.5.19)). 

4. It remains to put 


Tai ux — id) 
Qn (A) 
(Qn (A) — Pm(A) — One (A) GA + I) 
ir TTa1 ee —id) 


4(A) = 


’ 


tw_(A) = 


and note that w+ € Ki NK. 
The last assertion of the theorem follows from the fact that the real func- 
tion w() = v(—ip) for Im pw = 0 is convex on [0, 61), w’(0) = Eé < 0 and 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 359 


w(u) > co as sp —> f,. Therefore on [0, 6;) there exists a unique real solution 
to the equation w(j) = 1. There are no complex zeros in the half-plane Rew < p41 
since in this region, for Im yz 4 0, one has 


|W (u)| < v (Rep) < W(u1) = 1 


because of the presence of an absolutely continuous component. 
Necessity. Now let to (A) be rational. This means that 


CO 
4A) =c1 + / el™ g(x) dex, 
0 
where cy = to, (ico) and g(x) is an exponential polynomial. It follows from the 
equality (see (12.5.5)) 
two_(A)iA 


oa" = cot, (A)(1 — Eel" 
oat +(A)( er) 


oO | 0 : 
=cC (« + ; eo g(x) ax) i e'* dW(x), 
0 —0o 


where W(x) = —P(x? <x) for x < 0, cz = const, that € has a density for x > 0 
that is equal to 


1—y() =4() 


0 
/ dWw(t) g(x —f). 


Since the integral 


0 k 
/ dW(t) (x —t)ke BO) =e Ft S \(— ({) few. 
oo J , 


= 720 


0 
= / dw(t) tk Je, 
[o,@) 


is an exponential polynomial, the integral is oo AW (t) g(x — ft) is also an exponen- 
tial polynomial, which implies the rationality of E(e’*; € > 0). The theorem is 
proved. 


Example 12.5.1 Let the distribution F be exponential on the positive half-line: 
Pésazjy=ge", Ps0,g=1, 


Then gt (A) = gB/(B — iA) and we can put m=0,n = 1, Po(A) = GB, O\(A) = 
B—ik. The equation w(w) := Ee“#! = 1 has, in the half-plane Re yz > 0, the unique 
solution 21, 
fy — iA 
Qi(A) 


4 (A) = 


(see (12.5.15)). By Corollary 12.5.2, 


Reis — QiQ) _mi(B-iA) _ wr Bom 
Qi(0) (ui—iA) Blur —-idA) B Bo pia 


360 12 Random Walks and Factorisation Identities 


This yields P(S = 0) = 1/8, 
P(S ed 
See) a(12 pie ™'*  forx > 0, 
dx B 


i.e. the distribution of S is exponential on (0,00) with parameter jz; and has a 
positive atom at zero. 


Example 12.5.2 (A generalisation of Example 12.5.1) Let F have, on the positive 
half-line, the density }°7_, a,e~Pk* (a sum of exponentials), where 0 < f; < Bo < 
+++ < By, ax > 0. Then 


n 
On(a) = | [ (Be - id). 
k=1 
As was already noted in Theorem 12.5.1, the equation (2) := g(—iA) = 1 has, on 
the interval (0, 6), a unique zero jz;. The function w7 (2) := g (—iZ) is continu- 
ous, positive, and bounded for uz > 0. On each interval (6;, Bx41),k =1,...,n—-1, 
the function 
"aK Be 
Wt (uw) := 97 (-in) =) ——— 
em 

is continuous and changes from —oo to oo. Therefore, on each of these inter- 
vals, there exists at least one root x41 of the equation y(jw) = 1. Since by The- 
orem 12.5.1 there are only 7 roots of this equation in Re pz > 0, we obtain that x41 
is the unique root in (6;, 6,+1) and 


04 (A) = 


Tiare?) pias _ Il Be =i)k 49 5,99) 


On (A) (Mk — iA) BR 
This means that 1 — p:= P(S =0) =[[p_y ca and 


k=] 


P(S edx) _ 


n 
7 > be "for x > 0, 
x 


k=1 
where wx € (Be—1, Be), K=1,...,, Bo =O, and the coefficients b, are defined by 
the decomposition of (12.5.22) into partial fractions. 
By (12.5.16), 
; 1 = n = i, 
E(e**; n+ <0o)=1 Pa4 fe : ) 
(Bx — id) 


Feits 
so the conditional distribution of x, given y+ < oo has a density which is equal to 


. (12.5.23) 
k=1 


n 


yo cee (12.5.24) 
k=1 


where the coefficients cg, similarly to the above, are defined by the expansion 
of the right-hand side of (12.5.23) into partial fractions. Relation (12.5.24) means 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 361 


that the density of x; has the same “structure” as the density of € does for x > 0, 
but differs in coefficients of the exponentials only. By (12.5.16) this property of the 
density of x holds in the general case as well. 


12.5.4 Explicit Factorisation of the Function v(4) when the Left 
Tail of the Distribution F Is an Exponential Polynomial 


Now consider the case where the /eft tail of the distribution F has a density which is 
an exponential polynomial (belongs to the class €). In this case, 


_ oe ire, = Pin (A) 
ge ee = ay) 
where 
K K 
On(a) =] (Be —iay*, n= Sok, RePe <0, m<n. 
k=1 k=1 


Theorem 12.5.2 Let there exist KE < 0. For the positive component of the canonical 
factorisation (A) = t04.(A)tw_(A) of the function 
CU — gQ@))GA+ I) 

ir 


v(A) = 
to be representable as 
w+(A) = (1 — 9A))RQ), 


where R(A) is a rational function, it is necessary and sufficient that the func- 
tion yg (A) is rational. If gp” (A) = Pm(A)/Qn(A) then the function 1 — g(a) 
has precisely n — | zeros in the half-plane on Imi > 0 which we denote by 
—ipl,...,—ipn—1, and 


Qn (A) 


R(A)= 
AAT Tp] (He — 1A) 


Theorem 12.5.2, Corollary 12.5.1 and (12.3.3) imply the following assertion. 


Corollary 12.5.3 If EE <0 and (A) = Pin(A)/On(A) then 


40) EE Qn (OAT TAZ y (ue — iA) 
m4) (1—g(A)) On (A) TURE} me 
(1 = po) EE On (O)iA TTpay (He — i) 

Thy ek Qn) 


Fe!*S = 


’ 


Fe*x- =14 


362 12 Random Walks and Factorisation Identities 


Here the density of x_ has the same “structure” as the density of € does for 
x<0O. 


Proof of Theorem 12.5.2 The proof is close to that of Theorem 12.5.1, but unfortu- 
nately is not its direct consequence. We present here a brief proof of Theorem 12.5.2 
under the simplifying assumption that the distribution F is absolutely continuous. 
Using the scheme of the proof of Theorem 12.5.1, the reader can easily reconstruct 
the argument in the general case. 

Sufficiency. As in Theorem 12.5.1, we verify that the trajectory of v(A), —oo < 
A. < 00, does not intersect the ray arg v = —7r, so in our case there exists 


ind v:= lim indy v=0. 
T—>0o 
Put b := vb, 02, where 


On — Pm — Ont op  BEDO— 3 ae 

i= pe! = On 
Clearly, 02 € K_M XK and has exactly n — 1 zeros in J7_. Hence, by the argument 
principle, ind v2 = —(n — 1), and 


vp i= 


ind vj = —ind by =n— 1. 
Since by € KX, again using the argument principle we obtain that »;, as well 
as 1 — 9, has exactly n — | zeros —i4j,..., —ifdn—1 in I7+. Putting 
_ A =9)Qn sa ne 1) gat ee — 1) 
= = , i= , 
iA TTpai (ae — 2) On 


we obtain a canonical factorisation. 
Necessity. Similarly to the preceding arguments, the necessity follows from the 
factorisation identity 


1— g(a) =ci(1 —E(e**+; ny. < 00))t_ (a) 


oO | 0 : 
= af e!* dV (x) (« +f e!** g(x) ar), 
0 —co 


where V(x) = P(x4 > x; 4 < 00) for x > 0, c; = const and g(x) is an exponential 
polynomial. The theorem is proved. 


As in Sect. 12.5.1, we do not consider the case E& > 0 since it reduces to apply- 
ing the aforementioned argument to the random variable —é. 


12.5.5 Explicit Canonical Factorisation for the Function »°(X) 


The goal of this subsection, as it was in Sects. 12.5.3 and 12.5.4, is to find an ex- 
plicit form of the components tw{.(A) in the canonical factorisation of the function 


12.5 Explicit Form of Factorisation Components. The Non-lattice Case 363 


vA) = EEO (2 + 1) in (12.5.6) in terms of the zeros of the function 1 — g(A) 
in the case where Eé = 0 and either gt (A) or y (A) is a rational function. When 
Eé = 0, it is sufficient to consider the case where yt (A) is rational, i.e. the distribu- 
tion F has on the positive half-line a density which is an exponential polynomial, so 
that 


P(A) us ; . 
gt (A)= » OnAa=[ [be -i*, n= Doh. 
Qn(A) a a 


The case where it is the function gy (A) that is rational is treated by switching to 
random variable —é. 


Theorem 12.5.3 Let EE = 0 and E&? = o? < oo. For the positive compo- 
nent rot (A) of the canonical factorisation 


e'Ay=wl Awl”), wee Kink, 


to be a rational function it is necessary and sufficient that the function pt (A) = 
Ee; € > 0) is rational. If gt (A) = Pin (A)/Qn(A) is an uncancellable ratio 
of polynomials of degrees m and n, respectively, m <n, then the function 1 — g(A) 
has exactly n — | zeros in IT_ which we denote by —i[41,..., —iftn—1, and 


Hii ax = 12) =D 10° (A) = (= 9A) GA + DOn@) 


Ona) 2 The i(k —id) 
(12.5.25) 


ww!) (A) = 


Relation (12.5.3) and the uniqueness of canonical factorisation imply the follow- 
ing representation. 


Corollary 12.5.4 Under the conditions of Theorem 12.5.3, 
iAEX+ On(0) pail — 4) 
Qn (A) 


Proof The corollary follows from (12.5.7), (12.5.25), the uniqueness of canonical 
factorisation and the equalities 


Ee %+ = 


0 
; 0 yy ML OEX+ 
PLO)=Exy, vy Q)= : 
se at ro!) (0) 
bein  MRODA _ AB 4 On(O) TT 1H = ih) 
iA-I On(a) Tai et 


Thus, here the “structure” of the density of x, again repeats the structure of the 
density of € for x > 0. 


Proof of Theorem 12.5.3 The proof is similar to that of Theorem 12.5.1. 
Sufficiency. 
1. In the vicinity of the point 4 = 0, A € JT, the value of v°(A) lies in the vicinity 
of the point «7/2 > 0 by Property 7.1.5 of ch.f.s. Outside of a neighbourhood of 


364 12 Random Walks and Factorisation Identities 


zero, similarly to (12.5.17), we have 


2 
arg(1 — y(A)) ( a *). ee aa 


2° 2 qP 
This, analogously to (12.5.18), implies 
: 0 1 ‘i 0 
indy v’ := on d(argv (A)) € (-—b/2,b/2), b<1. 
JT 


2. Represent v° as v® = v1 02, where 
iA+ 1)” 
op= ae ) 


a 
12.5.2 
pw PE GLIA) On _ (Qn = Pn = ng = in) 7”? 
42GA + 12-1 MGA + 1)"7! 


Then, similarly to (12.5.20), we find that 
1 
indvy bi > n asToO, |n + indy tal = 5. 


3. We show that 1 — g(A) has exactly n — 1 zeros in I7T_. To this end, note that 
the function 02, which is analytic in J7_ and continuous on J7_ U IT has exactly n 
zeros in [7_. As in the proof of Theorem 12.5.1, consider the contour Jr . In the 
same way as in the argument in this proof, we obtain that 


1 
ax |, dere) =n, 


so that v2 has exactly n zeros in /T_. Further, by (12.5.26) we have v2 = v304, 
where the function v3 = (1 — iA)/(iA + 1) has one zero in [T_ at the point 4 = —i. 
Therefore the function 


(Qn = Pin a Ong”) 
b4= : 

A2(iA + 1)"-2 
which is analytic in J7_, has n — | zeros there. Since the zeros of 1 — g(A) and those 
of v4(A) in IT_ coincide, the assertion concerning the zeros of 1 — (A) is proved. 

4. It remains to put 
n—-1 c ; 
=I (ux — id) — iA) 
ro (A) = Mr , was 
On (A) 


and note that wf. € Ki NK. 


Necessity is proved in exactly the same way as in Theorems 12.5.1 and 12.5.2. 
The theorem is proved. 


(= g@))iA + D On) 
A? Taj (te — iA) 


12.6 Explicit Form of Factorisation in the Arithmetic Case 


The content of this section is similar to that of Sect. 12.5 and has the same structure, 
but there are also some significant differences. 


12.6 Explicit Form of Factorisation in the Arithmetic Case 365 


12.6.1 Preliminary Remarks on the Uniqueness of Factorisation 


As was already noted in Sect. 12.5, for arithmetic distributions defined by collec- 
tions of probabilities p, = P(§ = k), we should use, instead of the ch.f.s g(A), the 
generating functions 


[o,e) 
PQ) =Ee = Do chp 


k=—0o 


defined on the unit circle |z| = 1, which will be denoted by JT, as the axis ImA = 0 
was in Sect. 12.5. The symbols /7, (J7_) will denote the interior (exterior) of JT. 
For arithmetic distributions we will discuss the factorisation 


1 — p(z) = f+ (@f-@) 


on the unit circle, where f+ are analytic on J7+ and continuous including the bound- 
ary IT. Similarly to the non-lattice case, the classes of such functions, that, more- 
over, are bounded and bounded away from zero on /7i, we will denote by Kx. 
Continuous bounded functions on JT, which are also bounded away from zero, form 
the class K. The notion of canonical factorisation on IT is introduced in exactly the 
same way as above. Factorisation components must belong to the classes K. The 
uniqueness of factorisation components (up to a constant factor) is proved in the 
same way as in Lemma 12.1.1. 

We now show that if, similarly to the above, we “tweak” the function 1 — p(z) 
then it will admit a canonical factorisation. We will denote the tweaked function and 
its factorisation components by the same symbols as in Sect. 12.5. This will not lead 
to any confusion. 


Lemma 12.6.1 1. /fE& <0 then the function 


(1 — p(z))z 
l-z 


v(z) = 
belongs to K and admits a unique canonical factorisation 


b(z) = by (z)0_ (Zz), 


where 
l—p 
04(z):= 1—E(z**; ny <00) =, pp := P(n4. < 00), 
EzS 
1 — Ez*- 
eee 
(= 
2. If EE =0 and E&? < 00 then the function 
0 Cd — p@))z 
0° (Zz) = ——,_ 
(z) ey 


belongs to K and admits a unique canonical factorisation 


vp (z) = vf. (z)v (2), 


366 12 Random Walks and Factorisation Identities 


where 


0 
1 — Ezx+ 1 — Ezx- 
po = (OU Be 


’ = 


1-z l-z 


Here we do not discuss the case Eé > 0 since it reduces to the case EE < 0. We 
will also not present an analogue of Corollary 12.5.1 in view of its obviousness. 


Proof of Lemma 12.6.1 Let EE < 0. Since 
(1 — p(z))z 
i 


as z —> 1, p(z) is continuous on the compact /7 and, furthermore, |p(z)| < 1 for 
z # I, we see that v(z) is bounded away from zero on JT and bounded, and hence 
belongs to K. Further, by Corollary 12.2.2 (see (12.2.1) for iA = z), 


— —Eé>0 


ke 
v(z) = ae — E(2**; n+ <ov)], 


where E eu € (—oo, 0). Therefore, similarly to the above, we get 


0 
1 — Ezx- 
pe ee ee 
1—z 
Moreover, it is obvious that b_ (z) € K_. In the same way as above, we obtain that 
4.(z) = 1—E(z**; ny < co) EK, NK. 


This proves the first assertion of the lemma. 
The second assertion is proved similarly by using Corollary 12.2.4, by which 


1-Ezt+ (1—Ez%")z 


0 
v = 
@) 1-z 1-z 


Next, as before, we establish that v° € 1 and that the factors on the right-hand side, 
denoted by v{.(z), belong to Kz MK. The lemma is proved. 


12.6.2 The Classes of Distributions on the Positive Half-Line with 
Rational Generating Functions 


The content of Sect. 12.5.2 is mostly preserved here. Now by exponential polyno- 
mials we mean the sequences 


K 


Ik 
a= > Soa a, PHU Ices (12.6.1) 
k=1 j=1 


12.6 Explicit Form of Factorisation in the Arithmetic Case 367 


where gx < 1 are different (cf. (12.5.10)). To probabilities p, of such type will 
correspond rational functions 


— PulZ) 
+ m 
PWHEC SSS) r= , 
=e * On() 
where 1 <m<n,n= ye Ix, and, for definiteness, we put 
K 
On(z) =| [CU — 4x2)". (12.6.2) 

k=1 


Here a significant difference from the non-lattice case is that, for p*(z) to be 
rational, we do not need (12.6.1) to be valid for all x > 0. It is sufficient that (12.6.1) 


holds for all x, starting from some r +1 > 1. The first r probabilities p;,..., p, can 
be arbitrary. In this case pt (z) will have the form 
Pn(Z Pu(z 
pi@= m(Z) + T,(z) = M ) (12.6.3) 
On (Z) On (Z) 


where 7, is a polynomial of degree r (for r = 0 we put Ty = 0), so that pt is again 
a rational function, but now the degree of the polynomial Py 
m, ifr =0, 
M= ; (12.6.4) 
n+r, ifr >1 


in the numerator can be greater than the degree n of the polynomial in the denom- 
inator. In what follows, we only assume that n + r > 0, so that the value n = 0 is 
allowed (in this case there will be no exponential part in (12.6.1)). In that case we 
will assume that Qo = 1 and P,, = 0. The distributions corresponding to (12.6.3) 
will also be called exponential polynomials. 


12.6.3 Explicit Canonical Factorisation of the Function (z) in 
the Case when the Right Tail of the Distribution F Is an 
Exponential Polynomial 


Consider an arithmetic distribution F on the whole real line (—oo, 00), Eé < 0, 
which is an exponential polynomial on the half-line x > 0. As before, denote the 
class of all such distributions by EP. The generating function p(z) of the distribution 
F € €P can be represented as 


p(z)=pt(z)+p (2), 


where the function 


p (z) =E(z*; € <0) 


368 12 Random Walks and Factorisation Identities 


is analytic in J7_ and continuous including the boundary JT, and p*(z) is a rational 
function 
Py (z) 


OEE) OG, 


analytic in [7,. 
As above, in this case the canonical factorisation of the function 


(1 — p(z))z 


Bal 1-z 


can be found in explicit form in terms of the zeros of the function 1 — p(z). 


Theorem 12.6.1 Let there exist EE <0. For the positive component to (z) of the 
canonical factorisation 


o(z) = tw (z)tw_(z), we € Ki NK, 


to be a rational function it is necessary and sufficient that p* (z) = E(z; E>0) is 
a rational function. 

If pt = Py/Qn, where M is defined in (12.6.4), is an uncancellable ratio of 
polynomials then the function | — p(z) has in II_ exactly n +r zeros, which will be 
denoted by z1,..., Zn+r, and 


+. 
kn (Zk — Z) 


On (Z) 


’ 


4 (Z) = 


where Qn(ZzK) 4 0. 
If we arrange the zeros {zx} according to the values of |zx| in ascending order, 
then the point z, > | is a simple real zero. 


The theorem implies that 
(1 — p(z))z Qn) 
to_(z) = : 
0 OTT Ge 


By Lemma 12.6.1, from Theorem 12.6.1 we obtain the following representation. 


Corollary 12.6.1 IfEE <0 and pt = Py/Qp then 


41) On) Tz ee — 1) 


Ez* —_ = n+r . 
to (z) Q, (1) Wes (ZK — Z) 


Similarly to (12.5.16), we can also write down the explicit form of EX and 
E(z*+; 14 <0) as well. 


Proof of Theorem 12.6.1 The proof is similar to that of Theorem 12.5.1. 
Sufficiency. 
1. In the vicinity of the point z = 1 in /7 the value of —v(z) lies in the vicinity of 
the point —Eé > 0. Outside a neighbourhood of the point z = 1 we have for z € /7, 


12.6 Explicit Form of Factorisation in the Arithmetic Case 369 


um Zz 1 um 
arg(1 — p(z)) ( 5.5). are()= arg(1 =) ( 5.3). 


This implies that, for z € /7, 


arg(—v(z)) € (—7, x), 


and hence the trajectory of —v(z), z € IT, never intersects the ray argv = —7z,, 


1 20 % 
ind » := — d(arg v(e’“)) =0. 
sc | alaree(e)) 
2. Represent the function v as b = vb} 02, where 


eee __ Qn(z) — Pu) — Pp (Z) Qn) 
v1(z) = >—~ 02(Z) = (—z)zttr-1 . 


n(Z)” 


We show that 
ind v2 =—n—-r. (12.6.5) 


In order to do this, we first note that the function vj is analytic in [7 and has there 
a zero of multiplicity n + r. Hence by the argument principle ind vb} =n +r. Since 
0 = ind b = ind v, + ind to, we obtain the desired relation. 

3. We show that | — p(z) has exactly n +r zeros in I7_. The function v2(z) is 
analytic on /7_ and continuous including the boundary IT. The positively oriented 
contour JT, which contains 7, corresponds to the negatively oriented contour with 
respect to [7_. By (12.6.5) this means that »2(z) has precisely n + r zeros on IT_ 
while the point z = oo is not a zero since the numerator and the denominator of v(z) 
grow as |z|"*" as |z| > oo. 


4. Denote the zeros of v2 by z1,..., Zn4, and put 
n+r 
1 (Zk — Z zd — p(z))z 
rop(ix WEED ay (ey On = PO) 
On(z) (1—z) [Tj @e -— 2) 


It is easy to see that wi € Ki XK. The fact that Q, (zx) 40 and z; is a simple real 
zero of 1 — p(z) is proved in the same way as in Theorem 12.5.1. 

Necessity is also established in the same fashion as in Theorem 12.5.1. The the- 
orem is proved. 


Clearly, in the arithmetic case we have complete analogues of Examples 12.5.1 
and 12.5.2. In particular, if 


P(E=k)=cg!, c<(—q), k=1,2,..., 


then 
Z1—-Z gs  U- qz)(1-1) 
l= . (foe eee. 
io aa aT Gi —-Dd—-4q) 
_ —1 = —] k 
ps=)=2 pgapy ee TPE yy, 
l-q l-q 


In contrast to Sect. 12.5, here one can give another example where the distribution 
of S is geometric. 


370 12 Random Walks and Factorisation Identities 


Example 12.6.1 Let P(§ = 1) = py > 0 and P(é > 2) = 0. In this case x; = 1 on 
the set {n+ < oo}, and to find the distribution of S there is no need to use Theo- 
rem 12.6.1. Indeed, P(S = 0) = 1 — p= P(n+ = ov). If n+ < ow then the trajectory 
§,+15 6,=2,--- 18 distributed identically to 1, &, ... and hence 
0 with probability 1 — p, 
~ | x4+ Sq) with probability p, 


where the variable S(1) is distributed identically to S, x; = 1. This yields 


{ioe 
Ez* = (1— p) + pzEz’, E75 = E : 
1— pz 


P(S=kh)=(1—p)p*, k=0,1,... 


By virtue of identity (12.3.3) (for e’* = z) the point z; = p~ 
of the function 1 — p(z). 


' is necessarily a zero 


12.6.4 Explicit Canonical Factorisation of the Function v(z) when 
the Left Tail of the Distribution F Is an Exponential 
Polynomial 


We now consider the case where the distribution F on the negative half-line can be 


represented as an exponential polynomial, up to the values of P(é = —k) at finitely 
many points 0, —1, —2,..., —r. In this case, the value of p~ (z) is derived similarly 
to that of p*+(z) in (12.6.3) by replacing z with z~!: 

_ zn-M Pu (z) 


Dp (@= E(z5; — <0) 


’ 


On (Z) 
where Q,, and Py are polynomials (which differ from (12.6.3)), 


m ifr=0 - 
M={ ae =[[c-a)". 
ean 4: On(z) LG dk) 


and all g; < 1 are distinct. 


Theorem 12.6.2 Let there exist EE < 0. For the positive component of the canonical 
factorisation 
b(z) = t+ (z)tw_(z) 


to be representable as 


104.(z) = (1 — p(z))R(2), 
where R(z) is a rational function, it is necessary and sufficient that p~ (z) is a 
rational function. If 
7M Py(z) 


= _s 
p= ts 


’ 


12.6 Explicit Form of Factorisation in the Arithmetic Case 371 


where Py and Q,, are defined in (12.6.2) and (12.6.3), then the function 1 — p(z) 
has in ITs. exactlyn +r — 1 zeros that we denote by z1,...,Zn+r—1, and 
On (Z) 


R(z):= : 
w Gea ea) 


Proof The proof is very close to that of Theorems 12.5.2 and 12.6.1. Therefore we 
will only present a brief proof of sufficiency. 
1. As in Theorem 12.6.1, one can verify that 


ind v= 0. 
2. Represent 0(z) as b = 0102, where 
_ (On) — 2" Pu (2) — pt @On(2))z" a 
by i= : 02(Z) = : 
(l=2) On(z) 


The function v2 is analytic in J7_, continuous including the boundary JT, and has a 
zero at z = oo of multiplicity n +r — 1, so that 
ind v2 =n+r-—l1. 


The function bv, is analytic in /7+ and, by the argument principle, has there n +r — 1 
ZeYOS Z1,.--, Zn4+r—1- The function | — p(z) has the same zeros. 

3. By putting 
(1 = p(@)) Qn) _ 2Ttr @ ~ zm) 


=2)[ i, G2) ra On(z) 


we obtain tu € Ki XK. The theorem is proved. 


to (z) = 


12.6.5 Explicit Factorisation of the Function v°(z) 


By virtue of the remarks at the beginning of Sect. 12.5.5 it is sufficient to consider 
factorisation of the function 


0 Cl — p@)z 
0° (Zz) = ——_| 
(z) a2? 
for EE = 0 and Eé? < 00 just in the case when the function 
Pu (Z) 


pt(z)=E(z; & > 0) = 


On (Z) 
is rational, where Qn(z) = [][~_1 (1 — qxz)"*. n= DL, k (see (12.6.2), (12.6.3). 


Theorem 12.6.3 Let EE = 0 and E&* = o? < on. For the positive component 
ro (z) of the canonical factorisation 


v°(z) = (z)w2 (z), = we EC KENK, 


372 12 Random Walks and Factorisation Identities 


to be rational, it is necessary and sufficient that the function p*(z) is rational. If 
pt(z) = Py (z)/Qn(z), where M is defined in (12.6.4), is an uncancellable ratio of 
polynomials then the function 1 — p(z) has in IT_ exactly n +r — 1 zeros that we 
denote by z1,...,Zn+r—1, and 


a re 


(1 — p(z))zQn(z) 


(= z)2 (ag). 


Zk — Z) 


“0, (z) 


; ww? (z) = 


wes 


Corollaries similar to Corollary 12.5.4 hold true here as well. 


Proof of Theorem 12.6.3 The proof is similar to those of Theorems 12.5.3, 12.6.1 
and 12.6.2. Therefore, as in the previous theorem, we restrict ourselves to the key 
elements of the proof of sufficiency. 

1. In the vicinity of the point z = | in /7, the value of —v°(z) lies in the vicinity 
of the point «7/2 > 0. Outside of a neighbourhood of the point z = 1, for z € IT we 


have 
arg(1— p(z)) (-3.3). 


= ae) ee 
ae py? = arg Zz mo arg c= =) 


Hence 


0 1 OL iA 
ind p> := — d(argo (e’*)) =0. 
in = / (arg (e )) 
2. Represent the function v°(z) as 


»?(z) = v1 (z)b2(z), 


where 
ere Qn — Pu — p (2) On 
01(z) := ——,, 02(z) = 
1(2) 0,@) 2(Z) (l—pegntr2 
As before, we show that indb; =n +r — | and that 1 — p(z) has, on [T_, exactly 
n-+r-— 1 zeros, which are denoted by z1,..., Zn4,-—1. It remains to put 
anim 1 
(Zk — Z) z)( — p(z))z 
(2) = ip gy = 2nd = P@) 


“On (z) 


The theorem is proved. 


d _ z)2 a trl eg, —z) 


12.7 Asymptotic Properties of the Distributions of x4 and S 


We saw in the previous sections that one can find the distributions of the variables S$ 
and x+ in explicit form only in some special cases. Meanwhile, in applied problems 


12.7 Asymptotic Properties of the Distributions of x+ and S$ 373 


of, say, risk theory (see Sect. 12.4) one is interested in the values of P(S > x) for 
large x (corresponding to small ruin probabilities). In this connection there arises 
the problem on the asymptotic behaviour of P(S > x) as x — oo, as well as related 
problems on the asymptotics of P(|x+| > x). It turns out that these problems can be 
solved under rather broad conditions. 


12.7.1 The Asymptotics of P(x4 > x | < 00) and P(x® <—x) 
in the Case Ké < 0 


We introduce some classes of functions that will be used below. 


Definition 12.7.1 A function G(t) is called (asymptotically) locally constant (1.c.) 
if, for any fixed v, 


Git +v) 


——- > 1 ast-~w. (12.7.1) 
G(t) 


It is not hard to see that, say, the functions G(t) = t®[In(1 + 1)]”, t > 0, are Lc. 

We denote the class of all l.c. functions by £. The properties of functions from £ 
are studied in Appendix 6. In particular, it is established that (12.7.1) holds uni- 
formly in v on any fixed segment, and that G(t) = e°) and G(t) = 0(G/(t)) as 
t — oo, where 


Gi(t):= a G(u) du. (12.7.2) 
t 


Denote by € the class of distributions satisfying the right-hand side Cramér con- 
dition (the exponential class). The class €* C € of distributions G whose “tails” 
G(t) = G((t, oo)) satisfy, for any fixed v > 0, the relation 

G(t + v) 
G(t) 
could be called the “superexponential” class. For example, the normal distribution 
belongs to €*. In the arithmetic case, one has to put v = 1 in (12.7.3) and consider 
integer-valued f. 

In the case E& < 0 it is convenient to introduce a random variable x with the 

distribution 


>0 ast> oO, (12.7.3) 


P(x Edu; ne < ~w) 

P(x € dv) = P(x+ €du|ny < w)= - » p=P(n+ <0o). 
If E& = 0 then the distributions of x and x coincide. In the sequel we will confine 
ourselves to non-lattice € (then x+ will also be non-lattice). In the arithmetic case 
everything will look quite similar. 

Denote by F(t) the right “tail” of the distribution F: F(t) := F(t, 00)) and 
put 


or i. F,(u) du. 
r 


374 12 Random Walks and Factorisation Identities 


Theorem 12.7.1 Let there exist EE <0 and, in the case EE = 0, assume E&* < 00 
holds. 


1. If Fy.(t) =0(FL (1) as t > 0 then, as x > o, 


Fi (x) 
P(x eae (12.7.4) 
2. If Fy (t) = Vite", B>0, V EL then 
P(x >x)~ te ., (12.7.5) 
p(1 — Ee®x-) 
3. If Fy € €* then 
Fy (x) 
P ea 12.7.6 
(x > x) pPOe <0) ( ) 


Proof The proof is based on identity (12.2.1) of Corollary 12.2.2, which can be 
rewritten as 


1— (A) 
1— 9a)’ 


Introduce the renewal function H_(t) corresponding to the random variable x° < 0: 


| — pEe!*x = 9 (A) = Bel**, (12.7.7) 


(oe) 
HQ) =) PURE, Heat +x, 


k=0 
where xy are independent copies of x°, a_ := Ex® > —oo. As was noted in 
Sect. 10.1, the function 1/(1 — go (A)) can be represented as 


—o=-f e'' dH_(t) 
1— g(a) 7 —0o 7 


(the function H_(t) decreases). Therefore, for x > 0 and any N > 0, we obtain from 
(12.7.7) that 


0 0 —N 
pP(x > x)= -| dH_(t) Fi(x—t)= -| -| . (12.7.8) 
0° —N —oo 


Here, by the condition of assertion 1, 
0 
-| < Fy (x)[H_(-N) — H_)]=0(FL(x)) asx > oo. 
—N 


Evidently, this relation will still be true when N — o slowly enough as x —> oo. 
Furthermore, by the local renewal theorem, as N —> oo, 


—N —N I 
-| dH_(t) Feae—~ f F(x ne = a (12.7.9) 
—oo —oo a a 


For a formal justification of this relation, the interval (—oo, —N] should be divided 
into small intervals (—Nx+1, —Nx], kK =0,1,..., No=N, Nei > Ng, on each of 


12.7 Asymptotic Properties of the Distributions of x+ and S$ 375 


which we use the local renewal theorem, so that 


Fy (x — Nx) (Nev — Nx) 


|a_| 


—Nz 


(1400) = f dH_(t) Fy.(x —t) 


—Nx+1 
e Fy (x — Navi) (Neri — Nx) 


|a_| 


(1+ 0(1)). 


From here it is not difficult to obtain the required bounds for the left-hand side 
of (12.7.9) that are asymptotically equivalent to the right-hand side. Since, for N 
growing slowly enough, 


x+N 
Fl (x) — Fie +N) = Fy(u)du < Fy (x)N = 0(Fi(x)) 


one has F. i (x+N)~ F sf (x), and we finally obtain the relation 
F! N 
pP(x >x) ~ FEO+N) 
|a_| 
This proves (12.7.4). 
If Fi. (4) = V(t)e-*', V EZ, then we find from (12.7.8) that 


0 
pP(x > x) ~ -veet f dH_(t)e? = Ue 
—00 1 — Ee?x- 
This proves (12.7.5). 
Now let F, € €*. If we denote by ho > 0 the jump of the function H_(r) at the 
point 0 then, clearly, 


e HeGe=t 
-| jG ty as x > oO, 
oo Fy(x) 


and hence 
pP(x > x) ~ Fy. (x)ho. 
If we put g := P(x° = 0) then ho, being the average time spent by the random 
walk {H;} at the point 0, equals 
1 


[o,@) 
a a ear 
k=0 q 


The theorem is proved. 


Now consider the asymptotics of P(x? <—x) as x > oo. 
Put F_(t) := F((—o, —t)) =P < —12). 


Theorem 12.7.2 Let EE < 0. 


1. If F_€£ then,asx > w, 


376 12 Random Walks and Factorisation Identities 
2. If F_(t)=e-’'V(t), V(t) EL, then 


P(x° < x) 


Ee-’5 F_(x) 
l-p — 
3. If F_ € €* then 


F_(x)P(S =0 
P(y2 <—x) ~ Fe@IPS=0) 
Lp 

Proof Making use of identity (12.3.3): 
(1 — gQ))Ee*S 

[=p 
This implies that P( rau < —x) is the weighted mean of the value F_(x +t) with the 
weight function P(S € dt)/(1 — p): 


1-g2AQ)= , 9A) = Ee, 


P(x° <-x) = —/ P(S edt) F_(x +0). 


From here the assertions of the theorem follow in an obvious way. 


If EE = 0 then the asymptotics of P(x? < —x) will be different. 


12.7.2 The Asymptotics of P(S > x) 


We will study the asymptotics of P(S > x) in the two non-overlapping and mutually 
complementary cases where Fy € € (the Cramér condition holds) and where F. : 
belongs to the class S of subexponential functions. 


Definition 12.7.2 A distribution G on [0, co) with the tail G(t) := G([t, oo)) be- 
longs to the class 84 of subexponential distributions on the positive half-line if 


G*(t)~2G(t) ast —> oo. (12.7.10) 


A distribution G on the whole real line belongs to the class § of subexponential 
distributions if the distribution G* of the positive part ¢* = max{0, ¢} of arandom 
variable ¢ & G belongs to 8,. A random variable is called subexponential if its 
distribution is subexponential. 


As we will see later (see Theorem A6.4.3 in Appendix 6), the subexponentiality 
distribution G is in essence a property of the asymptotics of the tail of G(t) as 
t — oo. Therefore we can also talk about subexponential functions. A nonincreasing 
function G;(t) on (0, 00) is called subexponential if the distribution G with a tail 
G(t) such that G(t) ~ cG;(t) as t > oo for some c > 0 is subexponential. (For 
example, distributions with tails G;(t)/G (0) or min(1, G;(¢)) if G1(0) > 1.) 

The properties of subexponential distributions are studied in Appendix 6. In par- 
ticular, it is established that 8 C £L, R C S (R is the class of regularly varying func- 
tions) and that G(t) = 0(G/(t)) if G’ €8. 


12.7 Asymptotic Properties of the Distributions of y+ and S$ 


Theorem 12.7.3 If F{(t) € 8 and a =Eé <0, then, as x > 00, 


1 
P(S > x) ~ ae 
a 
Proof Making use of the identity from Theorem 12.3.2: 
1= 


Ee iAS __ 
1 — poy 0 


by (A) = Ee™*, 
it follows that 
CO 
eS = (1— p) > p* gf @), 


and hence, for x > 0, 


P(S>x)=(1—p) > p*PU > x), He =) xj, 


k=1 


377 


(12.7.11) 


(12.7.12) 


(127-13) 


where x; are independent copies of x. By assertion | of Theorem 12.7.1 the distri- 
bution of x is subexponential, while by Theorem A6.4.3 of Appendix 6, as x — oo, 


for each fixed k one has 


P(A, > x) ~ kP(y > x). 


(12.7.14) 


Moreover, again by Theorem A6.4.3 of Appendix 6, for any ¢ > 0, there exists a 


b = b(e) such that, for all x and k > 2, 
P(H; > x) 
P(x > x) 

Therefore, for (1 + €)p < 1, the series 


<b(it+e)*. 


3 P(A; > x) 


ar P(y > x) 


converges uniformly in x. Passing to the limit as x — ov, by virtue of (12.7.14) we 


obtain that 


P(S > x) 
Be erred Coa pvt = 


or, which is the same, that 


P 
P(S >x)~ PEGS) as x > oO, 
l—p 
where, by Theorem 12.7.1, 
I 
PX >x)~- +@) 


378 12 Random Walks and Factorisation Identities 


Since, by Corollary 12.2.3, 
(1 — p)Ex® =Eé, 
we obtain (12.7.11). The theorem is proved. 


Now consider the case when F satisfies the right-hand side Cramér condition 
(F € €). For definiteness, we will again assume that the distribution F is non- 
lattice. Furthermore, we will assume that there exists an j4; > 0 such that 


Ww (1) = Ee“§ = 1, b:=EBéeM§ = Wl (1) < 00. ~~ (12.7.15) 


In this case the Cramér transform of the distribution of F at the point jz; will be of 
the form 


Mitt) 
Fi) (dt) = ST = cHI"F(d1), (12.7.16) 


A random variable &(,,,) with the distribution F(,,,) has, by (12.7.15), a finite expec- 
tation equal to b. Denote the size of the first overshoot of the level x by a random 
walk with jumps &(,,,) by x(y,)(«). By Corollary 10.4.1, the distribution of x(,,,)(x) 
converges, as x — oo, to the limiting distribution: x(,,)(«) > X(y,), So that 


Ee XupO _, Be Xue) | (3717) 


Theorem 12.7.4 Let Fi € € and (12.7.15) be satisfied. Then, as x > ov, 
P(S>x)~ce"",” (12.7.18) 
where c= Ee 1X) <1, 
There is a somewhat different interpretation of the constant c in Remark 15.2.3. 
Exact upper and lower bounds for e“!* P(S > x) are contained in Theorem 15.3.5. 


Note that the finiteness of E& < 0 is not assumed in Theorem 12.7.4. In the 
arithmetic case, we have to consider only integer x. 


Proof Put n(x) :=min{n > 1: $8, > x}, X_ i= x, +--+ +x, and Xnl= maxz<n Xx. 
Then 


P(S > x) =P(n(x) < co) = SY P(n(x) =n); (12.7.19) 
n=1 
where 
Pinay) =f... [Pedr Fax) 1% <x,X,>x) 


ee 
n 


= ff Punlar...Buy dene I(Xn—-1 <x, Xn >x) 


ee 
n 


SE je I @G@)—n). (12.7.20) 


12.7 Asymptotic Properties of the Distributions of x+ and S$ 379 


Here E,,,,) denotes the expectation when taken assuming that the distribution of the 
summands ; is F(,,). By the convexity of the function y(j) = Ee“*, 


Eqwyé = [roeFan = (ui) =b = 0, 
and hence 


Pou) (n@) < co) = 1. 


Therefore, returning to (12.7.19), we obtain 


[o.@) 
PS =x) = Eas; > e150 T(n(x) =n) =Eqye 59), (1.7.21) 
k=1 


where Sy(x) =X + Xu) (x) and, by (12.7.17), 


eM XP(S > x) > c=Ee XH) <1, 


This proves (12.7.18). For arithmetic € the proof is the same. We only have to re- 
place F(dt) in (12.7.15) and (12.7.16) by py = P(é =k), as well as integration by 
summation. The theorem is proved. 


Corollary 12.7.1 [f, in the arithmetic case, EE <0, p) = P(E = 1) > 0, P(E = 2) = 
0 then the conditions of Theorem 12.7.4 are satisfied and one has 


P(S>xyHecMHD K>0. 
Proof The proof follows immediately from (12.7.21) if we note that, in the case 


under consideration, X(j,;)(x) = 1 and Sy.) = x +1. This assertion repeats the result 
of Example 12.6.1. 


Remark 12.7.1 The asymptotics (12.7.18), obtained by a probabilistic argument, 
admits a simple analytic interpretation. From (12.7.18) it follows that, as wt t41, 
we have 
pet iy 
hi- wb 
But that Ee”® has precisely this form follows from identity (12.3.3): 


(1 = p)(1 — Ee#X") 


Eels = 
1—y(u) 
Indeed, since, by assumption, w (wu) = Ee" is left-differentiable at the point jz; and 
W(w) = 1 — b(u1 — w) + 0((u41 — )), (12.7.22) 


one has 


0 
Fels ~w (l= _ (12.7.23) 


380 12 Random Walks and Factorisation Identities 


as 4 t (41. This implies, in particular, yet another representation for the constant c 
in (12.7.18): 


_ = py — Eet**) 
_ - 


Since 
to (0) 
to+(A) 
and to;(A) has a zero at the point 441, we can obtain representations similar to 
(12.7.22) and (12.7.23) in terms of the values of to (0) and tv’ (1). 

We should also note that the proof of asymptotics (12.7.18) with the help of 
relations of the form (12.7.23) is based on certain facts from mathematical analysis 
and is relatively simple only under the additional condition (12.5.1). 

There are other ways to prove (12.7.18), but they also involve additional restric- 
tions. For instance, (12.3.3) implies 


Fe#S — 


Ee'*S = (1 — p) a Ka) —g! aE], 


(oe) 


P(S > x) = (1— p) > [P(S, > x) — P(e + x2 > x)] 
k=0 
=a-pf[P (x° Dean) ScP (Sk € (x, x +1), 


k=0 


and the problem now reduces to integro-local theorems for large deviations of Sx; 
(see Chap. 9) or to local theorems for the renewal function in the region where the 
function converges to zero. 


12.7.3 The Distribution of the Maximal Values of Generalised 
Renewal Processes 


Let {(T;, ra be a sequence of independent identically distributed random vec- 
tors, : 
Z(t) = Zvi(t)s 


where 
n k 
Lis =e v(t) := max{k:7, <t}, Ty =e 


In Sect. 12.4.3 we reduced the problem of finding the distribution of sup, (Z(t) — qt) 
to that of the distribution of S := supzso Sk, Sk = ye 1§j, §j :=¢j — qt; in the 
case g > 0, ¢;% > 0. We show that such a reduction takes place in the general case 
as well. If g > 0 and the ¢,% can take values of both signs, then the reduction is the 


12.8 On the Distribution of the First Passage Time 381 


same as in Sect. 12.4.3. Now if g < 0 then 
sup(Zy(r) — qt) = sup(—qT,, Z1 — Tz, Z2 — qT, ...) 
t 
d 
= —qt + sup[Zp-1 — @(% — 1)] = S— gr, 
k>1 


where the random variables t; and S are independent. 


12.8 On the Distribution of the First Passage Time 
12.8.1 The Properties of the Distributions of the Times n+. 


In this section we will establish a number of relations between the random vari- 
ables n+ and the time 0 when the global maximum S = sup S; is attained for the 
first time: 


6:=min{fk:S,;=S} GfS<cas.). 
Put 
CO 
PQ@):= > 2P(n® >k), (2) = E(z™*|ny < 00), 


k=0 


a ee a a > 0) 


Further, let 7 be a random variable with the distribution 
Py =k) = P(ny =k | ny < 00) 

(and the generating function q(z)), 71,2, ... be independent copies of n, 
Ak i=m +--+ 0k, Hy =0, 


and v be a random variable independent of {nx} with the geometric distribution 
P(v =k) = (1— p)p*, k>0. 


Theorem 12.8.1 [f p =P(n4 < ©) <1 then 


l l-p=—p =e (12.8.1) 
: =p Ene =e . 8. 
2 P(z) : Ee (12.8.2) 
‘ a= => a Oo. 
l=pq@) 1—p 
3. P(n® > n) = (1 — p)P(Ay =n) > PCy =n) (12.8.3) 
for alln >= 0. 


Recall that, for the condition p < | to hold, it is sufficient that EE < 0 (see 
Corollary 12.2.6). 


382 12 Random Walks and Factorisation Identities 


The second assertion of the theorem implies that the distributions of 7°. , n+ and 6 
uniquely determine each other, so that if at least one of them is known then, to find 
the other two, it is not necessary to know the original distribution F. In particular, 
P@ =n) = (1— p)P(x® > n). 


Proof of Theorem 12.8.1 The arguments in this subsection are based on the follow- 
ing identities which follow from Theorems 12.1.1—12.1.3 if we put there 4 = 0 and 
|z| <1: 


1—z=[1-Ez™][1—E(c™; m, <co)], (12.8.4) 


CO Lk 
0 Zz 
1—Ez’- = - —Pp 8. 
Zz of » js <0}, (12.8.5) 
k=1 
[o.@) zk 
1—E(z™; ny <00) = oo Yo PS > o}. (12.8.6) 
Since 
0 
1 — Ez"- 
———=P@, P= En! 


we obtain from (12.8.4) the first equalities in (12.8.1) and (12.8.2). The second 
equality in (12.8.1) follows from (12.8.6). 
To prove the second equality in (12.8.2), we make use of the relation 


=. on {@: 74 = ox}, 


n+ +60* on{w:n+ < o}, 


where 0* is distributed on {74 < 00} identically to 0 and does not depend on n+. It 
follows that 


Ez’ = (1 — p) + Ez°E(z"*; ny. < 00). 


This implies the second equality in (12.8.2). The last assertion of the theorem fol- 
lows from the first equality in (12.8.2), which implies 


P() =>) p*g'@) =(1— p) Do PW =k) PU =) 2” 
k=0 k=0 n=0 


=(1—p) >> z"P(H, =n). 


n=0 


The theorem is proved. 


The second equality in (12.8.2) and identity (12.7.12) mean that the representa- 
tions 


6=n+---+m and S=xit---+xX, 


respectively, hold true, where v has the geometric distribution P(v = k) = 
d—- p)p*, k = 0, and does not depend on {7;}, {x;}- 


12.8 On the Distribution of the First Passage Time 383 


Note that the probabilities P(S, > 0) = P(S, — ak > —ak) on the right-hand 
sides of (12.8.5) and (12.8.6) are, for large k and a = Eé < 0, the probabilities of 
large deviations that were studied in Chap. 9. The results of that chapter on the 
asymptotics of these probabilities together with relations (12.8.5) and (12.8.6) give 
us an opportunity to find the asymptotics of P(n, =n) and P(n°. =n) as n > co 
(see [8]). 

Now consider the case where the both random variables 7° and n+ are proper. 
That is always the case if EE = 0 (see Corollary 12.2.6). Here identities (12.8.4)— 
(12.8.6) hold true (with P(j7+ < oo) = 1). As before, (12.8.4) implies that the dis- 
tributions of 7° and 1+ uniquely determine each other. 

Let 71, 2,... be independent copies of n+, Hy = 1 +--- +x and Ho = 0. For 
the sums H;, define the local renewal function 


ha = > P(A =n). 


n=0 
Theorem 12.8.2 [f P(n° < 00) = P(n4 < 00) = 1 then: 
1. En? = En, = 00. 
2. P(n® > n)=hy. 
Proof From (12.8.4) it follows that 
1-E*= 1 
l-z  1-—Ez"% 


as z—> 1. Since P(z)  En® as z > 1, we have proved that En. is infinite. That 
En, is also infinite is shown in the same way. The second assertion also follows 
from (12.8.7) since the right-hand side of (12.8.7) is ys 4 z"hy. The theorem is 
proved. 


P(2)= (12.8.7) 


Now we turn to the important class of symmetric distributions. We will say that 
the distribution of a random variable & is symmetric if it coincides with the distribu- 
tion of —&, and will call the distribution of € continuous if the distribution function 
of € is continuous. For such random variables, Eé = 0 (if Eé exists), the distribu- 
tions of S, are also symmetric continuous for all 7, and 


1 
P(S, > 0) =P(Sn <0) = 5 P(S, = 0) =0, 
and hence D(z) = 1, P(x? =0) =0, and ny = n°, x+ = x? with probability 1. 


Theorem 12.8.3 If the distribution of & is symmetric and continuous then 
(2n)! 1 
~ (2n — 1)(n!)222"— 2. /r 3/2’ 


P(n4 =n) =P(n2 =n) 


(12.8.8) 


1 
P(yn > 0) = P(on < 0) ~ aa 


asn — ©O (Y and Cy are defined in Section 12.1.3). 


384 12 Random Walks and Factorisation Identities 


Proof Since Ez™ = Ez"+, by virtue of (12.8.4) one has 
1-Ez"*=VJ1—-z. 


Expanding ./1 —z into a series, we obtain the second equality in (12.8.8). The 
asymptotic equivalence follows from Stirling’s formula. 
The second assertion of the theorem follows from the first one and the equality 
[o.@) 
Pn <0)= D> PO =). 


k=n+1 


The assertions concerning 7° and y» follow by symmetry. 
The theorem is proved. 


Note that, under the conditions of Theorem 12.8.3, the distributions of the vari- 
ables n+, 7, Yn, $n do not depend on the distribution of €. Also note that the 
asymptotics 


1 
2/7 n3/2 
persists in the case of non-symmetric distributions as well provided that E§ = 0 and 
Eé? < 00 (see [8]). 


P(n4 =n) ~ 


12.8.2. The Distribution of the First Passage Time of an Arbitrary 
Level x by Arithmetic Skip-Free Walks 


The main object in this section is the time 
n(x) =min{k : S$, > x} 


of the first passage of the level x by the random walk {5;}. Below we will consider 
the class of arithmetic random walks for which x+ = 1. 

By an arithmetic skip-free walk we will call a sequence {S;}7°.9, where the dis- 
tribution of & is arithmetic and max, &(@) = 1 (i.e. py > 0 and px = 0 for k > 2, 
where px = P(é = k)). The term “skip-free walk” appears due to the fact that the 
walk {S;}, k =0,1,..., cannot skip any integer level x > 0: if S, > x then neces- 
sarily there is ak <n such that S, = x. 

As we already know from Example 12.6.1, for skip-free walks with E& < 0 the 
distribution of S$ is geometric: 


P(S=k)=(1—p)p*, k=0,1,..., 


where p = P(ny < 00) and z} = p7! 


P(2) = Lg Peo" 
It turns out that one can find many other explicit formulas for skip-free 
walks. In this section we will be interested in the distribution of the maximum 


is the zero of the function 1 — p(z) with 


12.8 On the Distribution of the First Passage Time 385 


S, = max(0, S},...,S,); as we already noted, knowing the distribution is impor- 
tant for many problems of mathematical statistics, queueing theory, etc. Note that 
finding the distribution of S, is the same as finding the distribution of 7(x), since 


{Sn <x} ={n(x) > n}. (12.8.9) 


Here we put n(x) := oo if S <x. 

The Pollaczek—Spitzer identity (see Theorem 12.3.1) provides the double trans- 
form of the distribution of S,,. Analysing this identity shows that the distribution of 
S, (or n(x)) itself typically cannot be expressed in terms of the distribution of & in 
explicit form. However, for discrete skip-free walks one has remarkable “duality” 
relations which we will now prove with the help of Pollaczek—Spitzer’s identity. 


Theorem 12.8.4 If & is integer-valued then P(E, > 2) =0 is a necessary and suffi- 
cient condition for 


nP(n(x) =n) =xP(S,=x), x>1. (12.8.10) 


Using the Wald identity, it is also not hard to verify that if the expectation E&; = 
a > 0 exists then the walk {S,} will be skip-free if and only if En(x) = x/a. (Note 
that the definition of n(x) in this section somewhat differs from that in Chap. 10. 
One obtains it by changing x to x + | on the right-hand side of the definition of n(x) 
from Chap. 10.) 

The asymptotics of the local probabilities P(S,, = x) was studied in Chap. 9 (see 
e.g., Theorem 9.3.4). This together with (12.8.10) enables us to find the asymptotics 
of P(n(x) =n). 


Proof of Theorem 12.8.4 Set 


ry i= P(n(x) = oo) =P(S <x), dx,n = P(n(x) -_ n), 


Ocn=PlyGy =n) = 3 xk +1x. 
k=n+1 


Since for each y,0< y <x, 


n 


{n(x) =n} c LJ {no) =k}. 


k=0 
using the fact that the walk is skip-free, by the total probability formula one has 


n 


qx,.n = > Dy, k9x—y,n—k>» 
k=0 


where go,9 = 1, and gy,o = 0 for y > 0. Hence for |z| < | using convolution we have 


[ee] 


qx(2) =) dr n2" =E(2™™ n(x) < 00) = qy(2)ax-y(2). 
k=0 


386 12 Random Walks and Factorisation Identities 


Putting y = | and go(z) = 1, we obtain 


Qx(2)=q@axriwW=g%, x=. 
From here one can find the generating function Q,(z) of the sequence Q, pn: 


oo oo z oo n=1 
QO x(Z) = bP 4 ("+ =. au) — fet De be 
n=0 k=n+1 n=l k=0 
CO 


r 1— 2” r 1)- v4 1- Z 
= oe = x qx ( ) dx ( ) = qx ( ) 
n=1 


ras 1-z a es, 


1 


Note that here the quantity g,(1) = P(n(x) < oo) = P(S > x) can be less than 1. 
Using (12.8.9) we obtain that 


P(S, =x) = P(n(x +1)> n) _ P(n(x) > n), 


lore) _ x+l ae. ee x _ 
So 2"PG, = dd —g*™()) - d-4*)) _4 (z) q@)) 
ar l-z l-z 


Finally, making use of the absolute summability of the series below, we find that, 
for |v| < 1 and |z| < 1, 


= n —_ _— “pre _ = 1 — q(z) 
Dl ee eT 


n=0 x=0 n=0 


Turning now to the Pollaczek—Spitzer formula, we can write that 


CO on 00 x 
1- 1- 
- oe ey™x0.Sn) — yy LI) nie = q(z) 3 (vq@)y"_ 
n 1l-—z l-z x 
n=1 x=1 
Comparing the coefficients of v*, x > 1, we obtain 
CO ln z 
>i PS, =x) = 4 ae | (12.8.11) 
n x 


n=1 


Taking into account that q*(z) = gx(z) and comparing the coefficients of z”,n > 1, 
in (12.8.11) we get 


1 1 
—-P(S, =x) =—-P(@, =n), x>1,n>1. 
n x 


Sufficiency is proved. 
The necessity of the condition P(é > 2) = 0 follows from equality (12.8.10) for 


x=n=1: 


CO [o,@) 
Pi=g1=> Pe > Pe=PE>2)=0. 
k=1 k=2 


The theorem is proved. 


12.8 On the Distribution of the First Passage Time 387 


Using the obtained formulas one can, for instance, find in Example 4.2.3 the 
distribution of the time to ruin in a game with an infinitely rich adversary (the total 
capital being infinite). If the initial capital of the first player is x then, for the time 
n(x) of his ruin, we obtain 


P(n(x) =n) = —P(S; =x), 


where 
5. = Ey PE; =1l)=q, P(E; =—-l)=p 


(p is the probability for the first player to win in a single play). Therefore, if n and 
x are both either odd or even then 


x 
P(n(x) =n) = : ie ery, gq tt3)/2 peo—a)/2 (12.8.12) 
and P(n(x) =n) = 0 otherwise. 

It is interesting to ask how fast P(n(x) > n) decreases as n grows in the case 
when the player will be ruined with probability 1, i.e. when P(n(x) < oo) = 1. As 
we already know, this happens if and only if p < q. (The assertion also follows from 
the results of Sect. 13.3.) 

Applying Stirling’s formula, as was done when proving the local limit theorem 
for the Bernoulli scheme, it is not difficult to obtain from (12.8.12) that, for each 
fixed x, as n — co (n and x having the same parity), for p < q, 


x ie 
P(n(x) =n) ~ and soer"(s) 


/2 
P(n(x) =n) ~ FiG= aon 2 aoar(2 ‘i for p <q 


[| 2 
P(n(x) =n) ~x — forp=q. 
mn 


The last relation allowed us, under the conditions of Sect. 8.8, to obtain the lim- 
iting distribution for the number of intersections of the trajectory S,..., 5S, with 
the strip [u, v] (see (8.8.24)). Up to the normalising constants, this assertion also 
remains true for arbitrary random walks such that E&, = 0 and Eé? < oo. However, 
even in the case of a skip-free walk, the proof of this assertion requires additional 
efforts, despite the fact that, for such walks, an upward intersection of the line x = 0 
by the trajectory {S,,} divides the trajectory, as in Sect. 8.8, into independent identi- 
cally distributed cycles. 


and 


Chapter 13 
Sequences of Dependent Trials. Markov Chains 


Abstract The chapter opens with in Sect. 13.1 presenting the key definitions and 
first examples of countable Markov chains. The section also contains the classifica- 
tion of states of the chain. Section 13.2 contains necessary and sufficient conditions 
for recurrence of states, the Solidarity Theorem for irreducible Markov chains and 
a theorem on the structure of a periodic Markov chain. Key theorems on random 
walks on lattices are presented in Sect. 13.3, along with those for a general sym- 
metric random walk on the real line. The ergodic theorem for general countable 
homogeneous chains is established in Sect. 13.4, along with its special case for fi- 
nite Markov chains and the Law of Large Numbers and the Central Limit Theorem 
for the number of visits to a given state. This is followed by a short Sect. 13.5 de- 
tailing the behaviour of transition probabilities for reducible chains. The last three 
sections are devoted to Markov chains with arbitrary state spaces. First the ergod- 
icity of such chains possessing a positive atom is proved in Sect. 13.6, then the 
concept of Harris Markov chains is introduced and conditions of ergodicity of such 
chains are established in Sect. 13.7. Finally, the Laws of Large Numbers and the 
Central Limit Theorem for sums of random variables defined on a Markov chain are 
obtained in Sect. 13.8. 


13.1 Countable Markov Chains. Definitions and Examples. 
Classification of States 


13.1.1 Definition and Examples 


So far we have studied sequences of independent trials. Now we will consider the 
simplest variant of a sequence of dependent trials. 

Let G be an experiment having a finite or countable set of outcomes {E1, E2,...}. 
Suppose we keep repeating the experiment G. Denote by X, the number of the 
outcome of the n-th experiment. 

In general, the probabilities of different values of Ey,, can depend on what events 
occurred in the previous n — | trials. If this probability, given a fixed outcome Ex,,_, 
of the (n — 1)-st trial, does not depend on the outcomes of the preceding n — 2 trials, 
then one says that this sequence of trials forms a Markov chain. 


A.A. Borovkov, Probability Theory, Universitext, 389 
DOI 10.1007/978-1-4471-5201-9_13, © Springer-Verlag London 2013 


390 13 Sequences of Dependent Trials. Markov Chains 
To give a precise definition of a Markov chain, consider a sequence of integer- 
valued random variables {X, n}peo- If the n-th trial resulted in outcome E;, we set 


Xn i= j. 


Definition 13.1.1 A sequence {X,,}5° forms a Markov chain if 


P(X, = j|X0 =ko, X1 =k, ..., Xn-2 = kn-2, Xn-1 = 1) 
PA. =) ea) Se. (13.1.1) 


These are the so-called countable (or discrete) Markov chains, i.e. Markov chains 
with countable state spaces. 


Thus, a Markov chain may be thought of as a system with possible states 
{E, Eo,...}. Some “initial” distribution of the variable Xo is given: 


P(Xo= f=), Do pl=l. 


Next, at integer time epochs the system changes its state, the conditional probability 
of being at state E; at time n given the previous history of the system only being 
dependent on the state of the system at time n — 1. One can briefly characterise this 
property as follows: given the present, the future and the past of the sequence Xj, 
are independent. 

For example, the branching process {¢,,} described in Sect. 7.7, where ¢,, was the 
number of particles in the n-th generation, is a Markov chain with possible states 
{0, 1,2, ...}. 

In terms of conditional expectations or conditional probabilities (see Sect. 4.8), 
the Markov property (as we shall call property (13.1.1)) can also be written as 


PG =Jj | a(Xo,..., Xn-1)) =P x, | o(Xn-1)), 


where o(-) is the o-algebra generated by random variables appearing in the argu- 
ment, or, which is the same, 


POG F | Koees¢ Ma) =F OG |G 


This definition allows immediate extension to the case of a Markov chain with a 
more general state space (see Sects. 13.6 and 13.7). 
The problem of the existence of a sequence {X,,}9° which is a Markov chain 


with given transition probabilities pe ( ig >0,¥¢ j pe = 1) and a given “initial” 
distribution {pe} of the variable Xo can be solved in the same way as for independent 
random variables. It suffices to apply the Kolmogorov theorem (see Appendix 2) and 
specify consistent joint distributions by 


1 2 
P9= ig kph HPP Pet he es 


which are easily seen to satisfy the Markov property (13.1.1). 


13.1 Countable Markov Chains. Definitions and Examples 391 


Definition 13.1.2 A Markov chain {X,}>° is said to be homogeneous if the proba- 


bilities pe do not depend on n. 


We consider several examples. 


Example 13.1.1 (Walks with absorption and reflection) Let a > 1 be an integer. 
Consider a walk of a particle over integers between 0 and a. If0 <k <a, then from 
the point k with probabilities 1/2 the particle goes to k— 1 ork + 1. If k is equal to 0 
or a, then the particle remains at the point k with probability 1. This is the so-called 
walk with absorption. If X;, is a random variable which is equal to the coordinate 
of the particle at time n, then the sequence {X,} forms a Markov chain, since the 
conditional expectation of the random variable X;, given Xo, X1,..., Xn—1 depends 
only on the value of X;,_1. It is easy to see that this chain is homogeneous. 

This walk can be used to describe a fair game (see Example 4.2.3) in the case 
when the total capital of both gamblers equals a. Reaching the point a means the 
ruin of the second gambler. 

On the other hand, if the particle goes from the point 0 to the point | with prob- 
ability 1, and from the point a to the point a — 1 with probability 1, then we have a 
walk with reflection. It is clear that in this case the positions X,, of the particle also 
form a homogeneous Markov chain. 


Example 13.1.2 Let {&}?°9 be a sequence of independent integer-valued random 
variables and d > 0 be an integer. The random variables X;, := yo & (mod d) 
obtained by adding & modulo d (Xn = )“f_9 & — jd, where j is such that 0 < 
Xn < d) form a Markov chain. Indeed, we have X, = Xn—1 + &, (mod d), and 
therefore the conditional distribution of X, given Xj, X2,..., Xn—1 depends only 
on Xy_1. 

If, in addition, {&} are identically distributed, then this chain is homogeneous. 

Of course, all the aforesaid also holds when d = om, i.e. for the conventional 
summation. The only difference is that the set of possible states of the system is in 
this case infinite. 


From the definition of a homogeneous Markov chain it follows that the probabil- 
ities sae of transition from state E; to state E; on the n-th step do not depend on n. 
Denote these probabilities by p;;. They form the transition matrix P = || p;;|| with 
the properties 


P= Oy Py 
J 


The second property is a consequence of the fact that the system, upon leaving the 
state E;, enters with probability 1 one of the states FE), F2,.... 

Matrices with the above properties are said to be stochastic. 

The matrix P completely describes the law of change of the state of the system 
after one step. Now consider the change of the state of the system after k steps. We 


392 13 Sequences of Dependent Trials. Markov Chains 


introduce the notation p;;(k) := P(X; = j|Xo0 =i). For k > 1, the total probability 
formula yields 


pik) = ) > P(Xe-1 = 5|X0 =i) psj = D> piskk — VD psy. 
Ss Ss 


Summation here is carried out over all states. If we denote by P(k) := || pi; (k)|| the 
matrix of transition probabilities p;;(k), then the above equality means that P(k) = 
P(k —1)P or, which is the same, that P(k) = P*. Thus the matrix P uniquely 
determines transition probabilities for any number of steps. It should be added here 
that, for a homogeneous chain, 


P(Xntk = J|Xn = 1) = P(X = j|X0 = 1) = pij K). 


We see from the aforesaid that the “distribution” of a chain will be completely de- 
termined by the matrix P and the initial distribution PY =P(X9=k). 

We leave it to the reader as an exercise to verify that, for an arbitrary k > 1 and 
sets By,..., By—k, 


P(Xn = j|Xn—k = 1; Xn—K-1 € Bi, ..., X0 € Bn—x) = pij(k). 


To prove this relation one can first verify it for k = 1 and then make use of induction. 

It is obvious that a sequence of independent integer-valued identically distributed 
random variables X,, forms a Markov chain with p;; = pj = P(Xn = j). Here one 
ms P= P=. 


13.1.2 Classification of States! 


Definition 13.1.3 


K1. A state E; is called inessential if there exist a state E; and an integer fo > 0 
such that p;;(to) > 0 and p;;(t) =0 for every integer f. 
Otherwise the state E; is called essential. 
K2. Essential states FE; and E; are called communicating if there exist such integers 
t>Oands > 0 that p;j(t) > 0 and pj;(s) > 0. 


Example 13.1.3 Assume a system can be in one of the four states {F1, Fo, E2, E4} 
and has the transition matrix 


0 1/2 1/2 0 
ia 0: hh Te 
oO 0 i212 
0 0. 12 12 


‘Here and in Sect. 12.2 we shall essentially follow the paper by A.N. Kolmogorov [23]. 


13.1 Countable Markov Chains. Definitions and Examples 393 


Fig. 13.1 Possible transitions 1/2 


and their probabilities in 
Example 13.1.3 Ey E, 
, 1/2 


In Fig. 13.1 the states are depicted by dots, transitions from state to state by 
arrows, numbers being the corresponding probabilities. In this chain, the states E 
and E> are inessential while £3 and E4 are essential and communicating. 

In the walk with absorption described in Example 13.1.1, the states 1,2,..., 
a — | are inessential. The states 0 and a are essential but non-communicating, and it 
is natural to call them absorbing. In the walk with reflection, all states are essential 
and communicating. 

Let {Xn}Po9 be a homogeneous Markov chain. We distinguish the class Sof 
all inessential states. Let E; be an essential state. Denote by Sz, the class of states 
comprising £; and all states communicating with it. If E; € Sz,, then E; is essential 
and communicating with E;, and Ej; € Sg fe Hence Sz, = Sz 7 Thus, the whole set 
of essential states can be decomposed into disjoint classes of communicating states 
which will be denoted by S!, S?,... 


Definition 13.1.4 If the class Sz, consists of the single state E;, then this state is 
called absorbing. 


It is clear that after a system has hit an essential state E;, it can never leave 
the class Sz,. 


Definition 13.1.5 A Markov chain consisting of a single class of essential com- 
municating states is said to be irreducible. A Markov chain is called reducible if it 
contains more than one such class. 


If we enumerate states so that the states from S° come first, next come states 
from S! and so on, then the matrix of transition probabilities will have the form 
shown in Fig. 13.2. Here the submatrices marked by zeros have only zero entries. 
The cross-hatched submatrices are stochastic. 

Each such submatrix corresponds to some irreducible chain. If, at some time, the 
system is at a state of such an irreducible chain, then the system will never leave this 
chain in the future. Hence, to study the dynamics of an arbitrary Markov chain, it 
is sufficient to study the dynamics of irreducible chains. Therefore one of the basic 
objects of study in the theory of Markov chains is irreducible Markov chains. We 
will consider them now. 


394 13 Sequences of Dependent Trials. Markov Chains 


Fig. 13.2 The structure of s° s! Ss 
the matrix of transition 

probabilities of a general s 
Markov chain. The class S° 

consists of all inessential 

states, whereas S!, S*,... are sl 0 0 0 0 0 
closed classes of 
communicating states 


We introduce the following notation: 
[o,@) 
fi@OHaPXn SMe: ES He 
n=1 


f; M) is the probability that the system leaving the j-th state will return to it for the 
first time after n steps. The probability that the system leaving the j-th state will 
eventually return to it is equal to Fj. 


Definition 13.1.6 


K3. A state E; is said to be recurrent (or persistent) if F; = 1, and transient if 
Fj <i. 

K4. A state E; is called null if p;;(n) > 0 as n — on, and positive otherwise. 

KS. A state E; is called periodic with period d; if the recurrence with this state has 
a positive probability only when the number of steps is a multiple of d; > 1, 
and d; is the maximum number having such property. 


In other words, d; > 1 is the greatest common divisor (g.c.d.) of the set of num- 
bers {n : fj (1) > 0}. Note that one can always choose from this set a finite subset 
{n1,...,x} such that d; is the greatest common divisor of these numbers. It is also 
clear that p;;(n) = fj(n) =0 ifn 40 (mod dj). 


Example 13.1.4 Consider a walk of a particle over integer points on the real line 
defined as follows. The particle either takes one step to the right or remains on 
the spot with probabilities 1/2. Here f;(1) = 1/2, and if n > 1 then f;(n) = 0 for 
any point j. Therefore F’; < | and all the states are transient. It is easily seen that 
pjj(n) = 1/2" > 0 as n > o© and hence every state is null. 

On the other hand, if the particle jumps to the right with probability 1/2 and with 
the same probability jumps to the left, then we have a chain with period 2, since 
recurrence to any particular state is only possible in an even number of steps. 


13.2 Recurrence of States. Irreducible Chains 395 


13.2 Necessary and Sufficient Conditions for Recurrence of 
States. Types of States in an Irreducible Chain. The 
Structure of a Periodic Chain 


Recall that the function 
(oe) 
ate) = an" 
n=0 


is called the generating function of the sequence {a,}°° 9. Here z is a complex vari- 
able. If the sequence {a,,} is bounded, then this series converges for |z| < 1. 


Theorem 13.2.1 A state E; is recurrent if and only if P; = Saat pjj™) =o. For 
a transient E;,, 
po (13.2.1) 
= ce Py 2s 


The assertion of this theorem is a kind of expansion of the Borel—Cantelli lemma 
to the case of dependent events A, = {X, = j}. With probability | there occur 
infinitely many events A, if and only if 


lee) 
Y > P(An) = P;=00. 
n=1 


Proof By the total probability formula we have 
PAM =ffFDPy@—-D)+ fp@Pi@-Dt--- + fja-VYpyODt fp: 1. 


Introduce the generating functions of the sequences { pj; (n)} 5 and { fj (n)}% 9: 


P;(z) =) pyj)z", F(z) = GO: 


n=1 n=1 


Both series converge inside the unit circle and represent analytic functions. The 
above formula for p;;(7), after multiplying both sides by z” and summing up over 
n, leads (by the rule of convolution) to the equality 


Pj(z) = zfi(1)(1 + Pj(z)) +2 fi2)(1 + Pj(2)) +--+ = (1+ Pj (2) Fj (2). 
Thus 

P;(z) F(z) 
1+ Pj)’ 1+ Fj: 


Assume that P; = oo. Then Pj(z) — 00 as z f | and therefore Fj(z) — 1. Since 
F(z) < Fj for real z < 1, we have F; = | and hence £; is recurrent. 


Fi(zZ)= Pi(Z)= 


396 13 Sequences of Dependent Trials. Markov Chains 


Now suppose that F; = 1. Then F(z) > 1 as z ¢ 1, and so P;(z) — oo. There- 
fore Pj (z) = 00. 

If £; is transient, it follows from the above that P;(z) < 00, and setting z:= 1 
we obtain equality (13.2.1). 


The quantity P; = Soa pj; ™) can be interpreted as the mean number of visits 
to the state E;, provided that the initial state is also E;. It follows from the fact that 
the number of visits to the state E; can be represented as yee 11 (Xn = j), where, 
as before, /(A) is the indicator of the event A. Therefore the expectation of this 
number is equal to 


EX = f= EX =p)=)> py= PR 
n=1 n=1 


n=1 


Theorem 13.2.1 implies the following result. 
Corollary 13.2.1 A transient state is always null. 


This is obvious, since it immediately follows from the convergence of the series 
= pj”) < oo that pjj) —> 0. 

Thus, based on definitions K3-K5, we could distinguish, in an irreducible chain, 
8 possible types of states (each of the three properties can either be present or not). 
But in reality there are only 6 possible types since transient states are automatically 
null, and positive states are recurrent. These six types are generated by: 

1) Classification by the asymptotic properties of the probabilities pj; (7) (tran- 
sient, recurrent null and positive states). 

2) Classification by the arithmetic properties of the probabilities pj; (1) or fj(n) 
(periodic or aperiodic). 


Theorem 13.2.2 (Solidarity Theorem) In an irreducible homogeneous Markov 
chain all states are of the same type: if one is recurrent then all are recurrent, if 
one is null then all are null, if one state is periodic with period d then all states are 
periodic with the same period d. 


Proof Let Ex and E; be two different states. There exist numbers N and M such 
that 


Pej(N) >90,  pjr(M) > 0. 


The total probability formula 


Pee(N + M +n) = )° pui(N) pis(2) psk(M) 
l,s 


implies the inequality 


Pre(N + M +n) = pei (N) jj Pjx(M) = aBp jj). 


13.2 Recurrence of States. Irreducible Chains 397 


Here n > 0 is an arbitrary integer, a = p;;(N) > 0, and B = p;;(M) > 0. In the 
same way one can obtain the inequality 


Dij(N+M +n) > appgx(n). 


Hence 


1 
sp Pin(N + M +n) = pix(n) 2 aBpre(n — M — N). (13.2.2) 
We see from these inequalities that the asymptotic properties of pz, (n) and 
pjj() are the same. If Ex is null, then pxx(n) — 0, therefore pjj(n) > 0 and 
EF; is also null. If Ey is recurrent or, which is equivalent, P; = ead Dkk (n) = &, 
then 


oO oe) 
= pjj(n) = ap S Pee(n —-M —N) =, 
nas n=M+N+1 


and E; is also recurrent. 

Suppose now that EF, is a periodic state with period d;. If pxx(n) > 0, then dx 
divides n. We will write this as dx | n. Since pxx(M + N) > af > 0, then dx | 
(M+N). 

We now show that the state E; is also periodic and its period dj; is equal to dx. 
Indeed, if pj; (1) > 0 for some n, then by virtue of (13.2.2), pxx(n +M +N) > 0. 
Therefore dj | (n + M+ N), and since d; | (M+ N), dy | n and hence dy < d;. In 
a similar way one can prove that d; < dy. Thus d; = dx. 


If the states of an irreducible Markov chain are periodic with period d > 1, then 
the chain is called periodic. 

We will now show that the study of periodic chains can essentially be reduced to 
the study of aperiodic chains. 


Theorem 13.2.3 If a Markov chain is periodic with period d, then the set of states 
can be split into d subclasses %,W%,..., Y%q—, such that, with probability 1, in one 
step the system passes from Y, to %+1, and from Wg_ the system passes to Y. 


Proof Choose some state, say, E;. Based on this we will construct the subclasses 
W,W,..., Wq_1 in the following way: E; € YW, 0<a <d —1, if there exists an 
integer k > 0 such that pi; (kd +a) > 0. 

We show that no state can belong to two subclasses simultaneously. To this end 
it suffices to prove that if E; € YW and pi;(s) > 0 for some s, then s =a (mod d). 

Indeed, there exists a number ¢; > 0 such that p;1(t1) > 0. So, by the definition 
of YW, we have pi, (kd +a+1t,) > 0. Moreover, p11(s + 1f,) > 0. Hence d | (kd + 
a+t,) andd | (s+t,). This implies a = s (mod d). 

Since starting from the state E1 it is possible with positive probability to enter 
any state E;, the union |) a Yq contains all the states. 


398 13 Sequences of Dependent Trials. Markov Chains 


Fig. 13.3. The structure of Wo Ww Wei 
the matrix of transition 
probabilities of a periodic Yo] 0 0 0 
Markov chain: an illustration 
to the proof of wi} 0 0 0 
Theorem 13.2.3 
0 0 0 
Wo 0 0 0 


We now prove that in one step the system goes from W, with probability 1 to 
W,+41 (here the sum a + | is modulo d). We have to show that, for E; € YW, 


> Pij =1. 


EjeWos1 


To do this, it suffices to prove that pjj = 0 when E; € YW, Ej ¢ Moi. 
If we assume the opposite (p;; > 0) then, taking into account the inequality 
pii(kd + a) > 0, we have p1;(kd +a + 1) > 0 and consequently Ej € Y%4+41. This 
contradiction completes the proof of the theorem. 


We see from the theorem that the matrix of a periodic chain has the form shown 
in Fig. 13.3 where non-zero entries can only be in the shaded cells. 

From a periodic Markov chain with period d one can construct d new Markov 
chains. The states from the subset W, will be the states of the a-th chain. Transition 
probabilities are given by 


Pi; *= pij(d). 


By virtue of Theorem 13.2.3, )°¢ cy, pe, = 1. The new chains, to which one can 
. . . . J ad Wy . 
reduce in a certain sense the original one, will have no subclasses. 


13.3 Theorems on Random Walks on a Lattice 


1. A random walk on integer points on the line. Imagine a particle moving on 
integer points of the real line. Transitions from one point to another occur in equal 
time intervals. In one step, from point k the particle goes with a positive probability 
Pp to the point k + 1, and with positive probability g = 1 — p it moves to the point 
k — 1. As was already mentioned, to this physical system there corresponds the 
following Markov chain: 


Xn = Xn-1 + En =Xo+ Sn, 


where &, takes values 1 and —1 with probabilities p and q, respectively, and S, = 
ie &. The states of the chain are integer points on the line. 


13.3. Theorems on Random Walks on a Lattice 399 


It is easy to see that returning to a given point with a positive probability is only 
possible after an even number of steps, and fo(2) = 2pq > 0. Therefore this chain 
is periodic with period 2. 

We now establish conditions under which the random walk forms a recurrent 
chain. 


Theorem 13.3.1. The random walk {Xy} forms a recurrent Markov chain if and only 
if p=q=1/2. 


Proof Since 0 < p < 1, the random walk is an irreducible Markov chain. Therefore 
by Theorem 13.2.2 it suffices to examine the type of any given point, for example, 
zero. 

We will make use of Theorem 13.2.1. In order to do this, we have to investigate 
the convergence of the series )°°° ; poo(n). Since our chain is periodic with period 
2, one has poo(2k + 1) = 0. So it remains to compute bw poo(2k). The sum S,, is 
the coordinate of the walking particle after n steps (Xo = 0). Therefore poo (2k) = 
P(S2, = 0). The equality $2, = 0 holds if k of the random variables &; are equal 
to | and the other k are equal to —1 (k steps to the right and k steps to the left). 
Therefore, by Theorem 5.2.1, 


1 1 
Pp Ss =0 se, ee 2kA(L/2) — 4 k 
Sa ae Sue 


We now elucidate the behaviour of the function B(p) = 4pq = 4p(1 — p) on the 
interval [0, 1]. At the point p = 1/2 the function B(p) attains its only extremum, 
B(/2) = 1. At all the other points of [0,1], B(p) < 1. Therefore 4pq < 1 for 
p # 1/2, which implies convergence of the series }-?~.; poo(2k) and hence the tran- 
sience of the Markov chain. But if p = 1/2 then poo(2k) ~ 1//zk and the series 
a 1 Poo(2k) diverges, which implies, in turn, that all the states of the chain are 
recurrent. The theorem is proved. 


Theorem 13.3.1 allows us to make the following remark. If p ~ 1/2, then the 
mean number of recurrences to 0 is finite, as it is equal to 5°72, poo(2k). This 
means that, after a certain time, the particle will never return to zero. The particle 
will “drift” to the right or to the left depending on whether p is greater than 1/2 or 
less. This can easily be obtained from the law of large numbers. 

If p = 1/2, then the mean number of recurrences to 0 is infinite; the particle 
has no “drift”. It is interesting to note that the increase in the mean number of re- 
currences is not proportional to the number of steps. Indeed, the mean number of 
recurrences over the first 2n steps is equal to }°7_) poo(2k). From the proof of The- 
orem 13.3.1 we know that pog(2k) ~ 1/Vak. Therefore, as n — oo, 


n n 
1 2/n 
S~ poo(2k) ~~ > aw VE. 
k=1 a k=1 1k - 


400 13 Sequences of Dependent Trials. Markov Chains 


Thus, in the fair game considered in Example 4.2.2, the proportion of ties rapidly 
decreases as the number of steps increases, and deviations are growing both in mag- 
nitude and duration. 


13.3.1 Symmetric Random Walks in R*, k > 2 


Consider the following random walk model in the k-dimensional Euclidean space 
R*. If the walking particle is at point (m1,...,mx), then it can move with prob- 
abilities 1/2* to any of the 2* vertices of the cube |x; —mj| = 1, ie. the points 
with coordinates (m; + 1,...,m, + 1). It is natural to call this walk symmetric. 
Denoting by X;, the position of the particle after the n-th jump, we have, as before, 
a sequence of k-dimensional random variables forming a homogeneous irreducible 
Markov chain. We shall show that all states of the walk on the plane are, as in the 
one-dimensional case, recurrent. In the three-dimensional space, the states will turn 
out to be transient. Thus we shall prove the following assertion. 


Theorem 13.3.2 The symmetric random walk is recurrent in spaces of one and two 
dimensions and transient in spaces of three or more dimensions. 


In this context, W. Feller made the sharp comment that the proverb “all roads 
lead to Rome” is true only for two-dimensional surfaces. The assertion of Theo- 
rem 13.3.2 is adjacent to the famous theorem of Pélya on the transience of sym- 
metric walks in R* for k > 2 when the particle jumps to neighbouring points along 
the coordinate axes (so that ; assumes 2k values with probabilities 1/2k each). We 
now turn to the proof of Theorem 13.3.2. 


Proof of Theorem 13.3.2 Let k = 2. It is not difficult to see that our walk X,, can be 
represented as a sum of two independent components 


Xn = (Xp 1,0) +(0,Xn), (Xo, XO) = Xo. 


where X ‘ i=1,2,..., are scalar (one-dimensional) sequences describing symmet- 
ric independent random walks on the respective lines (axes). This is obvious, for the 


two-dimensional sequence admits the representation 


Xn+l =X,+&, (13.3.1) 


where &, assumes 4 values (£1, 0) + (0, +1) = (+1 
each. 

With the help of representation (13.3.1) we can investigate the asymptotic be- 
haviour of the transition probabilities pj;; (1). Let Xo coincide with the origin (0, 0). 
Then 


1) with probabilities 1/4 


poo(2n) = P(X = (0, 0)| Xo = (0, 0)) 


13.3. Theorems on Random Walks on a Lattice 401 


= P(X},, = 0|X5 = 0)P(X3,, = 0|XG = 0) ~ (1//an)? = 1/(xn). 


From this it follows that the series eer Poo(n) diverges and so all the states of our 
chain are recurrent. 

The case k = 3 should be treated in a similar way. Represent the sequence X,, as 
a sum of three independent components 


Xn = (X,,,0,0) + (0, X7, 0) + (0,0, X?), 


where the X E are, as before, symmetric random walks on the real line. If we set 
Xo = (0, 0, 0), then 


poo(2n) = (P(X, =0| X} =0))° ~ 1/(rny??. 


The series peal Poo(n) is convergent here, and hence the states of the chain are 
transient. In contrast to the straight line and plane cases, a particle leaving the origin 
will, with a positive probability, never come back. 

It is evident that a similar situation takes place for walks in k-dimensional space 


with k > 3, since pe (an)—*/2 < oo for k > 3. The theorem is proved. 


13.3.2 Arbitrary Symmetric Random Walks on the Line 


Let, as before, 
n 
Xn=Xot+) &), (13.3.2) 
1 


but now &; are arbitrary independent identically distributed integer-valued random 
variables. Theorem 13.3.1 may be generalised in the following way: 


Theorem 13.3.3 If the &; are symmetric and the expectation Ké; exists (and hence 
Eé ; = 0) then the random walk X,y, forms a recurrent Markov chain with null states. 


Proof It suffices to verify that 


YS PCS; =0) =00, 


n=1 
where S$, = I €;, and that P(S,, = 0) — 0 as n — oo. Put 


p(Q):=Ez = >> 2*P&, =8). 


k=—00 


402 13 Sequences of Dependent Trials. Markov Chains 


Then the generating function of S,, will be equal to Ez*" = p”(z), and by the inver- 
sion formula (see Sect. 7.7) 


1 
P(S, =0) = al pis ide, (13.3.3) 


y Pts =i | dz =f" dt 
ne Oni Siger 2 pp) Jo 1 ple) 


n=0 
The last equality holds since the real function p(r) is even and is obtained by sub- 
stituting z =e". 
Since E&; = 0, one has 1 — ple!) = o(t) as t > 0 and, for sufficiently small 6 
and0 <t <6, 
0<1-p(e) <t 


(the function p(e!’) is real by virtue of the symmetry of €). This implies 


f dt Fe dt 
— = =CO 
o l—ple")~ Jo t 


Convergence P(S, = 0) — 0 is a consequence of (13.3.3) since, for all z on the 
circle |z| = 1, with the possible exclusion of finitely many points, one has p(z) < 1 
and hence p”(z) > 0 as n — oo. The theorem is proved. 


Theorem 13.3.3 can be supplemented by the following assertion. 


Theorem 13.3.4 Under the conditions of Theorem 13.3.3, if the g.c.d. of the possi- 
ble values of &; equals | then the set of values of {Xn} constitutes a single class of 
essential communicating states. This class coincides with the set of all integers. 


The assertion of the theorem follows from the next lemma. 


Lemma 13.3.1 [f the g.c.d. of integers a; > 0,...,a; > 0 is equal to 1, then there 
exists anumber K such that every natural k > K can be represented as 


k=naq,+-:-+n,a,, 
where n; => 0 are some integers. 


Proof Consider the function L(n) = n\a; +---+n,a,, where n= (n1,...,7;) is 
a vector with integer (possibly negative) components. Let d > 0 be the minimal 
natural number for which there exists a vector n° such that 


d=L(n’). 


13.3. Theorems on Random Walks on a Lattice 403 


We show that every natural number that can be represented as L(n) is divisible by d. 
Suppose that this is not true. Then there exist n, k and 0 < a <d such that 


Lin) =kd+a. 
But since the function L(n) is linear, 
L(n— kx°) =kd +a—kd=a <d, 


which contradicts the minimality of d in the set of positive integer values of L(n). 
The numbers a,...,a, are also the values of the function L(n), so they are 
divisible by d. The greatest common divisor of these numbers is by assumption 
equal to one, so that d = 1. 
Let & be an arbitrary natural number. Denoting by 6 < A the remainder after 
dividing k by A := a, +---+a,, we can write 


k= m(a,+-+-+a-) +0 =m(a, +++» +a;) +0L(n°) 
=a)(m+ On!) +a2(m+ On$) +---+a,-(m+On?), 


where nj; :=m + én? >0,i=1,...,7, for sufficiently large k (or m). 
The lemma is proved. 


Proof of Theorem 13.3.4 Put qj := P(§ =a;) > 0. Then, for each k > K, there 
exists an n such that n ; > 0, ye ajn; =k, and hence, for n = SS nj, we have 


Pok(n) = qi! ++: gr" > 0. 


In other words, all the states k > K are reachable from 0. Similarly, all the states 
k < —K are reachable from 0. The states k € [—K, K] are reachable from the 
point —2K (which is reachable from 0). The theorem is proved. 


Corollary 13.3.1 If the conditions of Theorems 13.3.3 and 13.3.4 are satisfied, then 
the chain (13.3.2) with an arbitrary initial state Xo visits every state k infinitely 
many times with probability 1. In particular, for any Xq and k, the random variable 
v=min{n : X;, =k} will be proper. 


If we are interested in investigating the periodicity of the chain (13.3.2), then 
more detailed information on the set of possible values of §; is needed. We leave 
it to the reader to verify that, for example, if this set is of the form {a + axd}, 
k=1,2,...,d> 1, g.c.d. (aj, a2,...) = 1, g.c.d. (a, d) = 1, then the chain will be 
periodic with period d. 


404 13 Sequences of Dependent Trials. Markov Chains 


13.4 Limit Theorems for Countable Homogeneous Chains 


13.4.1 Ergodic Theorems 


Now we return to arbitrary countable homogeneous Markov chains. We will need 
the following conditions: 


(I) There exists a state Ep such that the recurrence time t to E, (P(t =n) = 
fs(n)) has finite expectation Et) < oo. 

(II) The chain is irreducible. 

(III) The chain is aperiodic. 


We introduce the so-called “taboo probabilities” P;(n, 7) of transition from E; 
to E; inn steps without visiting the “forbidden” state E;: 


P,(n, j) = P(Xn = j;X1 #i,..., Xn-1 Fi | X=). 


Theorem 13.4.1 (The ergodic theorem) Conditions (I)—(IID are necessary and suf- 
ficient for the existence, for all i and j, of the positive limits 


lim pij@)=2;>0, i,j =0,1,2,.... (13.4.1) 
nC i 


The sequence of values {1 ;} is the unique solution of the system 


Lj-o7 = 1, 
1 : 13.4.2 
(ee am J HO Z) cees ( ) 


in the class of absolutely convergent series. 
Moreover, Et‘) < 00 for all j, and the quantities rjp= (Et‘))-! admit the 
representation 


Tj = (Ex) = (Er®)~ eee (k, j) (13.4.3) 
foranys. 
Definition 13.4.1 A chain possessing property (13.4.1) is called ergodic. 


The numbers zr; are essentially the probabilities that the system will be in the 
respective states E; after a long period of time has passed. It turns out that these 
probabilities lose depentience on the initial state of the system. The system “forgets” 
where it began its motion. The distribution {7} is called stationary or invariant. 
Property (13.4.2) expresses the invariance of the distribution with respect to the 
transition probabilities p;;. In other words, if P(X, =k) = mx, then P(X;4+1 =k) = 
> 7; pjk is also equal to zx. 


13.4 Limit Theorems for Countable Homogeneous Chains 405 


Proof of Theorem 13.4.1 Sufficiency in the first assertion of the theorem. Consider 
the “trajectory” of the Markov chain starting at a fixed state E,. Let t) > 1, t2 > 1, 
... be the time intervals between successive returns of the system to Ey. Since after 
each return the evolution of the system begins anew from the same state, by the 
Markov property the durations t;, of the cycles (as well as the cycles themselves) 


: ; : fron d ie ‘ 
are independent and identically distributed, t% = tS). Moreover, it is obvious that 


P(t =n) =P(t =n) = f,(n). 


Recurrence of E, means that the t, are proper random variables. Aperiodicity 
of E, means that the g.c.d. of all possible values of t; is equal to 1. Since 


Pss(n) = P(y (n)= 0), 


where y (7) is the defect of level for the renewal process {7;}, 


k 
Tk = a 
rl 


by Theorem 10.3.1 the following limit exists 


1 
lim pss(n) = lim P(y(n) = 0) =—  >0. (13.4.4) 
noo noo Et 
Now prove the existence of limy—oo psj(n) for j #8. If y(n) is the defect of level 
n for the walk {7;,} then, by the total probability formula, 


n 


pej(n) = YP(y(n) =K)P(Xn = j1Xo=5. (0) =). (13.4.5) 
k=1 


Note that the second factors in the terms on the right-hand side of this formula do 
not depend on n by the Markov property: 


P(X, = j|Xo=s,y(n) =k) 
= P(X, = j|X0=5, Xn-1 FS,-.--, Xn—k41 FS, Xn-k =S) 


Ps (k, j) 
P(t, >k)’ 
(13.4.6) 


=P(X, = j|Xo=5, X14S,..., Xp-1 FS) = 


since, for a fixed Xp = s, 


P(XR = j, X1 FS,...,Xk-1 FS) 
P(X, F5,...,X¢-1 #5) 

_ Ps(k, j) 

~ P(r) > ky)’ 


P(X, = f|X1 AS,...,Xk-1 FH) = 


406 13 Sequences of Dependent Trials. Markov Chains 


For the sake of brevity, put P(t; > k) = Px. The first factors in (13.4.5) converge, 
as n — 00, to Py_1/Et and, by virtue of the equality 


P(y(n) =k) =P(y(n —k) =0) Pes < Ph, (13.4.7) 


are dominated by the convergent sequence P;_;. Therefore, by the dominated con- 
vergence theorem, the following limit exists 


(oe) 


Py-1 Py (k, j) ee 
li => Ses y P,(k, j) =:7;, 13.4.8 
lim, Psj(") fa Er, P(q =k) Er =A (kJ) 7 ( 


and we have, by (13.4.5)-(13.4.7), 


n (oe) 
pein) < > Palk i) SO Plk, ) = jE. (13.4.9) 
k=1 k=1 


To establish that, for any 7, 
lim pjj(1) =; > 0, 
n—-> Oo . 


we first show that the system departing from E£; will, with probability 1, eventually 
reach E;. 

In other words, if fj;() is the probability that the system, upon leaving E;, hits 
E, for the first time on the n-th step then 


y=, 


n=1 


Indeed, both states E; and Fy are recurrent. Consider the cycles formed by sub- 
sequent visits of the system to the state E;. Denote by A, the event that the system 
is in the state E, at least once during the k-th cycle. By the Markov property the 
events A, are independent and P(A;) > 0 does not depend on k. Therefore, by the 
Borel—Cantelli zero—one law (see Sect. 11.1), with probability 1 there will occur 
infinitely many events A; and hence P((J Ax) = 1. 

By the total probability formula, 


pij(n)= >> fis) psj(n —&), 


k=1 
and the dominated convergence theorem yields 
CO 
lim pij(n) =) fis (kj = 73. 
n=1 


Representation (13.4.3) follows from (13.4.8). 


13.4 Limit Theorems for Countable Homogeneous Chains 407 


Now we will prove the necessity in the first assertion of the theorem. That con- 
ditions (II)—(III) are necessary is obvious, since p;j(n) > 0 for every i and j if n 
is large enough. The necessity of condition (I) follows from the fact that equalities 
(13.4.4) are valid for E,. The first part of the theorem is proved. 

It remains to prove the second part of the theorem. Since 


\ oat 


one has Yi mj < 1. By virtue of the inequalities p,;(m) < mjEt, (see (13.4.9)), 
we can use the dominated convergence theorem both in the last equality and in the 
equality psj(n + 1) = peo Psk (2) pe; Which yields 


oo 
Seal, mj =D) Me DK;. 
k=0 


It remains to show that the system has a unique solution. Let the numbers {q;} also 
satisfy (13.4.2) and assume the series }° |qj| converges. Then, changing the order 
of summation, we obtain that 


93 =) kPkj =) Phi (x pus) =Do ad) piera = 2) aP1j2) 
k k I I k I 
= a Plj (2) 2 Pnitn) = > dm Pmj B=. = So ak Pai (n) 
1 m m k 


for any n. Since )> qx = 1, passing to the limit as n > oo gives 


a= > dere = 7; 
k 


The theorem is proved. 


If a Markov chain is periodic with period d, then pj;;(t) = 0 for t # kd and every 
pair of states £; and E; belonging to the same subclass (see Theorem 13.2.3). But 
if t = kd, then from the theorem just proved and Theorem 13.2.3 it follows that the 
limit lim oo pij (kd) = 1; > 0 exists and does not depend on i. 

Verifying conditions (II)—(IID) of Theorem 13.4.1 usually presents no serious dif- 
ficulties. The main difficulties would be related to verifying condition (I). For finite 
Markov chains, this condition is always met. 


Theorem 13.4.2 Let a Markov chain have finitely many states and satisfy conditions 
(D-II). Then there exist c > 0 and q <1 such that, for the recurrence time t to 
an arbitrary fixed state, one has 


P(t >n) <cq", n>1. (13.4.10) 


408 13 Sequences of Dependent Trials. Markov Chains 


These equalities clearly mean that condition (1) is always met for finite chains 
and hence the ergodic theorem for them holds if and only if conditions (I1)—(IID) are 
satisfied. 


Proof Consider a state E, and put 
rj(n) = P(X, #s,k=1,2,...,n|Xo= J). 


Then, if the chain has m states one has r;(m) < 1 for any j. Indeed, rj(n) does 
not grow as n increases. Let N be the smallest number pei rj(N) < 1. This 
means that there exists a sequence of states E;, Ej,,..., Ejy such that Ej, = Es 
and the probability of this sequence pjj,--- Pjy_;jy 18 postive: But it is ae to 
see that N < m, since otherwise this sequence would contain at least two identical 
states. Therefore the cycle contained between these states could be removed from 
the sequence which could only increase its probability. Thus 


rj(m)<1, rim) = nant) <1. 


Moreover, rj(21 +12) <rj(n)r(n2) <r(m1)r(n2). 
It remains to note that if t is the recurrence time to E;, then P(t > nm) = 
r;(nm) <r(m)". The statement of the theorem follows. 


Remark 13.4.1 Condition (13.4.10) implies the exponential rate of convergence of 
the differences |p;;(”) — 2;| to zero. One can verify this by making use of the 
analyticity of the function 


F(z) = > fs(n)z” 


n=1 
in the domain |z| < q~!, qg~! > 1, and of the equality 
1 
P, = ——_——._ - 1 13.4.11 
() = ) | Pss(m)2" = Ee ( ) 
(see Theorem 13.2.1; we assume that the t in condition (13.4.10) refers to the state 
Es, so that f,(n) = P(t =n)). Since F/(1) = Er = 1/715, one has 


Rea14 Bix, 
4 


AY 


and from (13.4.11) it follows that the function 


= 2 Pss(n) — Ts)z 


is analytic in the disk |z| < 1+ ¢, e > 0. It evidently follows from this that 


Ps(Z) — 


|Pss() —15|<cU+e)", c=const. 


13.4 Limit Theorems for Countable Homogeneous Chains 409 
Now we will give two examples of finite Markov chains. 


Example 13.4.1 Suppose that the behaviour of two chess players A and B playing 
in a multi-player tournament can be described as follows. Independently of the out- 
comes of the previous games, player A wins every new game with probability p, 
loses with probability g, and makes a tie with probability r = 1 — p— q. Player B is 
less balanced. He wins a game with probabilities p + ¢, p and p — «, respectively, if 
he won, made a tie, or lost in the previous one. The probability that he loses behaves 
in a similar way: in the above three cases, it equals g — €, g and q + e, respectively. 
Which of the players A and B will score more points in a long tournament? 

To answer this question, we will need to compute the stationary probabilities 
IT|, 2, 13 Of the states E,, Ex, E3 which represent a win, tie, and loss in a game, 
respectively (cf. the law of large numbers at the end of this section). 

For player A, the Markov chain with states E,, E>, E3 describing his perfor- 
mance in the tournament will have the matrix of transition probabilities 


It is obvious that 7) = p, 72 =r, 73 =q here. 
For player B, the matrix of transition probabilities is equal to 


pPt+ée r q-€ 
Pe={ P or 4q 


Equations for stationary probabilities in this case have the form 
m(p+é)+ 2p +m3(p—&)=M1, 
mr+mer+73r = 72, 
m+m2+73=1. 


Solving this system we find that 


1—2e° 


m—-r=0, jy —-p=é 


Thus, the long run proportions of ties will be the same for both players, and B will 
have a greater proportion of wins if ¢ > 0, p > q ore <0, p <q. If p=4q, then the 
stationary distributions will be the same for both A and B. 


Example 13.4.2 Consider the summation of independent integer-valued random 
variables &, &2,... modulo some d > | (see Example 13.1.2). Set Xo := 0, X1 := 
&) — L&1/d|d, Xz := X1 + &o — |(X1 + &2)/d]d etc. (here |x| denotes the integral 


410 13 Sequences of Dependent Trials. Markov Chains 


part of x), so that X,, is the remainder of the division of X,_; + &, by d. Such sum- 
mation is sometimes also called summation on a circle (points 0 and d are glued 
together in a single point). Without loss of generality, we can evidently suppose that 
&; takes the values 0, 1,...,d — 1 only. If P(& = j) = p; then 


: pj-i FBT, 

py =P(Xn=J\Xe=Dap_ 
Pa+j-i tf j <i. 

Assume that the set of all indices k with px > 0 has a g.c.d. equal to 1. Then it is 

clear that the chain {X,,} has a single class of essential states without subclasses, 

and there will exist the limits 


lim Pij (n) = Tj 
noo 


satisfying the system )); 7; pij = ™j, )- 7; =1, j =0,...,d — 1. Now note that 
the stochastic matrix of transition probabilities || p;;|| has in this case the following 


property: 
yey pe. 
i J 


Such matrices are called doubly stochastic. Stationary distributions for them are 
always uniform, since 2; = 1/d satisfy the system for final probabilities. 

Thus summation of arbitrary random variables on a circle leads to the uniform 
limit distribution. The rate of convergence of p;;(k) to the stationary distribution is 
exponential. 

It is not difficult to see that the convolution of two uniform distributions under 
addition modulo d is also uniform. The uniform distribution is in this sense stable. 
Moreover, the convolution of an arbitrary distribution with the uniform distribution 
will also be uniform. Indeed, if 7 is uniformly distributed and independent of & 
then (addition and subtraction are modulo d, p; = P(§1 = /)) 


d-1 1 


d-1 
> ; 1 
j=0 j=0 


Thus, if one transmits a certain signal taking d possible values (for example, 
letters) and (uniform) “random” noise is superimposed on it, then the received signal 
will also have the uniform distribution and therefore will contain no information 
about the transmitted signal. This fact is widely used in cryptography. 

This example also deserves attention as a simple illustration of laws that appear 
when summing random variables taking values not in the real line but in some group 
(the set of numbers 0, 1,...,d — 1 with addition modulo d forms a finite Abelian 
group). It turns out that the phenomenon discovered in the example—the uniformity 
of the limit distribution—holds for a much broader class of groups. 

We return to arbitrary countable chains. We have already mentioned that the main 
difficulties when verifying the conditions of Theorem 13.4.1 are usually related to 


13.4 Limit Theorems for Countable Homogeneous Chains 411 


condition (I). We consider this problem in Sect. 13.7 in more detail for a wider 
class of chains (see Theorems 13.7.2—13.7.3 and corollaries thereafter). Sometimes 
condition (I) can easily be verified using the results of Chaps. 10 and 12. 


Example 13.4.3 We saw in Sect. 12.5 that waiting times in the queueing system 
satisfy the relationships 


Xn4l = max(Xy, + €n41, 0), w, =0, 


where the &, are independent and identically distributed. Clearly, X, form a ho- 
mogeneous Markov chain with the state space {0, 1,...}, provided that the & are 
integer-valued. The sequence X, may be interpreted as a walk with a delaying 
screen at the point 0. If E&, <0 then it is not hard to derive from the theorems 
of Chap. 10 (see also Sect. 13.7) that the recurrence time to 0 has finite expectation. 
Thus, applying the ergodic theorem we can, independently of Sect. 11.4, come to 
the conclusion that there exists a limiting (stationary) distribution for X, as n > oo 
(or, taking into account what we said in Sect. 11.4, conclude that sup;..9 Sx is finite, 


where S; = Ss §;, which is essentially the assertion of Theorem 10.2.1). 


Now we will make several remarks allowing us to state one more criterion for 
ergodicity which is related to the existence of a solution to Eq. (13.4.2). 

First of all, note that Theorem 13.2.2 (the solidarity theorem) can now be com- 
plemented as follows. A state E; is said to be ergodic if, for any i, pjj(n) > mj > 0 
as n — oo. A state E; is said to be positive recurrent if it is recurrent and non-null 
(in that case, the recurrence time t to E; has finite expectation E tWY) < oo). It 
follows from Theorem 13.4.1 that, for an irreducible aperiodic chain, a state E js 
ergodic if and only if it is positive recurrent. If at least one state is ergodic, all states 
are. 


Theorem 13.4.3 Suppose a chain is irreducible and aperiodic (satisfies conditions 
(1)-(ID). Then only one of the following two alternatives can take place: either all 
the states are null or they are all ergodic. The existence of an absolutely convergent 
solution to system (13.4.2) is necessary and sufficient for the chain to be ergodic. 


Proof The first assertion of the theorem follows from the fact that, by the local 
renewal Theorem 10.2.2 for the random walk generated by the times of the chain’s 
hitting the state £7, the limit lim... p;j(n) always exists and equals (Er)-1, 

Therefore, to prove sufficiency in the second assertion (the necessity follows 
from Theorem 13.4.1) we have, in the case of the existence of an absolutely con- 
vergent solution {7}, to exclude the existence of null states. Assume the contrary, 
pij(n) > 0. Choose 7 such that 2; > 0. Then 


O<z; = So api) > 0 


as n —> oo by dominated convergence. This contradiction completes the proof of the 
theorem. 


412 13 Sequences of Dependent Trials. Markov Chains 


13.4.2 The Law of Large Numbers and the Central Limit Theorem 
for the Number of Visits to a Given State 


In conclusion of this section we will give two assertions about the limiting be- 
haviour, as n — oo, of the number mj(n) of visits of the system to a fixed state 
E;; by the time n. Let t/) be the recurrence time to the state Ej. 


Theorem 13.4.4 Let the chain be ergodic and, at the initial time epoch, be at an 
arbitrary state E;. Then, asn > oo, 
Em ;(n) mj(n) as. 
=> 


Tj, — Tj. 
n : n . 


If additionally Var(t\) = oF < 00 then 


p(n <x|Xo =s) —> @(x) 


dap 
Oj Nn; 


as n —> &, where P(x) is, as before, the distribution function of the normal law 
with parameters (0, 1). 


Proof Note that the sequence m ;(n) + 1 coincides with the renewal process formed 
by the random variables T1, T2, 73,..., where tT; is the time of the first visit to the 


state E; by the system which starts at E, and t, £ t) for k > 2. Clearly, by the 
Markov property all t; are independent. Since t, > 0 is a proper random variable, 
Theorem 13.4.4 is a simple consequence of the generalisations of Theorems 10.1.1, 
11.5.1, and 10.5.2 that were stated in Remarks 10.1.1, 11.5.1 and 10.5.1, respec- 
tively. 

The theorem is proved. 


Summarising the contents of this section, one can note that studying the se- 
quences of dependent trials forming homogeneous Markov chains with discrete sets 
of states can essentially be carried out with the help of results obtained for sequences 
of independent random variables. Studying other types of dependent trials requires, 
as a rule, other approaches. 


13.5° The Behaviour of Transition Probabilities for Reducible 
Chains 


Now consider a finite Markov chain of the general type. As we saw, its state space 
consists of the class of inessential states S° and several classes S!,..., S! of es- 
sential states. To clarify the nature of the asymptotic behaviour of p;; (1) for such 


13.5 The Behaviour of Transition Probabilities for Reducible Chains 413 


Fig. 13.4 The structure of 
the matrix of transition 
probabilities of a periodic 
Markov chain with the class 
S° of inessential states: an 
illustration to the proof of resins 
Theorem 13.2.3 ROK SKY 


chains, it suffices to consider the case where essential states constitute a single class 
without subclasses (/ = 1). Here, the matrix of transition probabilities p;;(n) has 
the form depicted in Fig. 13.4. 

By virtue of the ergodic theorem, the entries of the submatrix L have positive 
limits 7 ;. Thus it remains to analyse the behaviour of the entries in the upper part 
of the matrix. 


Theorem 13.5.1 Let E; € S°. Then 


if Ej E 5°, 


jim th= 
ee as ae ifE;<¢S'. 


Proof Let E; € S°. Set 
Aj(t) := max p;j(t). 
E;eS° 
For any essential state E, there exists an integer ¢, such that p;,(t,) > 0. Since 


transition probabilities in L are all positive starting from some step, there exists an s 
such that p;;(s) > 0 for E; € S° and all E; € S!. Therefore, for sufficiently large ¢, 


pis (t= >> pik(s)pai(t—s) <Aj@—s) D> pikls), 


Exe 8° Exe S° 


where 


ai):= >> picls)=1— D> pies) <1. 


E,eS° ExeS! 


If we put g := maxz,<s0 q(i), then the displayed inequality implies that 
A;(t) < qA,(t —s)<--< qu/sl, 


Thus lim;_, 69 pij (t) < lim; +9 Aj(t) = 0 
Now let E; € S° and Ej € S!. One has 


Pitt+ 9) =) Pile) paj(6) = S~ pitt) pej(s) + Y~ pik(t) Pej (s). 


E,eS® E,eS! 


414 13 Sequences of Dependent Trials. Markov Chains 


Letting ¢ and s go to infinity, we see that the first sum in the last expression is o(1). 
In the second sum, 


Yo pie) =1+0(); pest) =j + 0(1). 
EcS! 


Therefore 


pijt+s)=7j > pik(t) + o(t) =; +001) 
ExeS 


as t, s > oo. The theorem is proved. 


Using Theorem 13.5.1, it is not difficult to see that the existence of the limit 
jim pij(n) = 1; 2 0 


is a necessary and sufficient condition for the chain to have two classes S° and S!, 
of which S! contains no subclasses. 


13.6 Markov Chains with Arbitrary State Spaces. Ergodicity of 
Chains with Positive Atoms 


13.6.1 Markov Chains with Arbitrary State Spaces 


The Markov chains X = {X,} considered so far have taken values in the count- 
able sets {1,2,...} or {0,1,...}; such chains are called countable (denumerable) 
or discrete. Now we will consider Markov chains with values in an arbitrary set of 
states X endowed with a o-algebra By of subsets of X. The pair (X, By) forms 
a (measurable) state space of the chain {X,,}. Further let (2, §, P) be the underly- 
ing probability space. A measurable mapping Y of the space (2, §) into (X, Bx) is 
called an X-valued random element. If X = R and Bx is the o-algebra of Borel sets 
on the line, then Y will be a conventional random variable. The mapping Y could 
be the identity, in which case (2, §) = (X, Bx) is also called a sample space. 
Consider a sequence {X,,} of X-valued random elements and denote by Fx m, 


m > k, the o-algebra generated by the elements X;,..., Xm (i.e. by events of 
the form {X; € Bx},...,{Xm € Bm}, Bi € Br, i=k,...,m). It is evident that 
$n := 8o.n form a non-decreasing sequence §o C $1... C ¥n-... The conditional 


expectation E(é |§%,m) will sometimes also be denoted by E(&| Xx, ..., Xm). 


Definition 13.6.1. An X-valued Markov chain is a sequence of X-valued elements 
X,, such that, for any B € Bx, 


P(Xn41 € B| Fn) =P(Xn41 €B| Xn) as. (13.6.1) 


13.6 Chains with Arbitrary State Spaces. Ergodicity 415 


In the sequel, the words “almost surely” will, as a rule, be omitted. 
By the properties of conditional expectations, relation (13.6.1) is clearly equiva- 
lent to the condition: for any measurable function f : X — R, one has 


E(f(Xn+1) | Fn) = E(f(Xn+1) | Xn)- (13.6.2) 
Definition 13.6.1 is equivalent to the following. 


Definition 13.6.2 A sequence X = {X,,} forms a Markov chain if, for any A € 
Sn+1,00> 


P(A|5n) = P(A|Xn) (13.6.3) 


or, which is the same, for any §n+1,00-measurable function f(@), 
E(f(@)|$n) =E(f(@)|Xn). (13.6.4) 


Proof of equivalence We have to show that (13.6.2) implies (13.6.3). First take any 
Bi, By € By and let A:= {Xn+1 € By, Xn42 € Bo}. Then, by virtue of (13.6.2), 


P(A|Sn) = E[M(Xn41 E By)P(Xn+2 E BolFn41)18n | 
= E[I(Xn41 € By) P(Xn42 € BalXn+ Fn] 
= E(A|X;,). 


This implies inequality (13.6.3) for any A € An+i.n+2, where Ax m is the algebra 
generated by sets {X, € By,..., Xm € Bm}. It is clear that An+1.,+2 generates 
Sn+1n+2- Now let A € §n+1,n4+2. Then, by the approximation theorem, there exist 
Ax € An+i.n+2 such that d(A, Ax) > 0 (see Sect. 3.4). From this it follows that 


I(Az) = I(A) and, by the properties of conditional expectations (see Sect. 4.8.2), 
P(AgI3*) > P(AIS*), 


where §* C § is some o-algebra. Put P4 = P4(w) := P(A|X,,). We know that, for 
Ax € An+in+2> 


E(P4,; B) = P(AxB) (13.6.5) 


for any B € §, (this just means that P4,(@) = P(Axg|8n)). Again making use of 
the properties of conditional expectations (the dominated convergence theorem, see 
Sect. 4.8.2) and passing to the limit in (13.6.5), we obtain that E(P4; B) = P(AB). 
This proves (13.6.3) for A € ¥n+1,n42- 

Repeating the above argument m times, we prove (13.6.3) for A € §n+1,m- Using 
a similar scheme, we can proceed to the case of A € §n+1,00- 


Note that (13.6.3) can easily be extended to events A € §n,oo. In the above proof 
of equivalence, one could work from the very beginning with A € §n,oo (first with 
A € An.n42, and so on). 

We will give one more equivalent definition of the Markov property. 


416 13 Sequences of Dependent Trials. Markov Chains 


Definition 13.6.3 A sequence {X,,} forms a Markov chain if, for any events A € $, 
and B € $n.co, 


P(AB|Xz) = P(A|Xn)P(B|Xn). (13.6.6) 


This property means that the future is conditionally independent of the past given 
the present (conditional independence of §, and §y,o given X;). 


Proof of the equivalence Assume that (13.6.4) holds. Then, for A € §, and B € 
$n,0o> 


P(AB|Xn) = E[EC 41g 15n)|Xn] = E[LAECa|n)/Xn] 
= E[IAE(z IXn)|Xn] = Ede |Xn)ECa|Xn), 


where I, 1s the indicator of the event A. 
Conversely, let (13.6.6) hold. Then 


P(AB) = EP(AB|X,) = EP(A|X,)P(B|Xn) (13.6.7) 
= EE[L4P(B|Xn)|Xn] = ElaP(B|Xn). ~ 
On the other hand, 
P(AB) = Elylg = Ely P(B|F,,). (13.6.8) 


Since (13.6.7) and (13.6.8) hold for any A € ¥,, this means that 


P(B\Xn) = P(BI¥n)- 


Thus, let {X,,} be an X-valued Markov chain. Then, by the properties of condi- 
tional expectations, 


P(Xn41 € B|Xn) = Pony) (Xn, B), 


where the function P,(x, B) is, for each B € 8x, measurable in x with respect to 
the o-algebra 8-y. In what follows, we will assume that the functions P(n)(x, B) 
are conditional distributions (see Definition 4.9.1), i.e., for each x € X, Pry) (x, B) 
is a probability distribution in B. Conditional distributions P,)(x, B) always exist 
if the o-algebra Sx is countably-generated, i.e. generated by a countable collec- 
tion of subsets of X (see [27]). This condition is always met if X = R* and By 
is the o-algebra of Borel sets. In our case, there is an additional problem that the 
“null probability” sets NN’ C X, on which one can arbitrarily vary Py,)(x, B), can 
depend on the distribution of X,,, since the “null probability” is with respect to the 
distribution of X;,. 


Definition 13.6.4 A Markov chain X = {X,} is called homogeneous if there ex- 
ist conditional distributions P,,)(x, B) = P(x, B) independent of n and the initial 
value Xo (or the distributions of X,,). The function P(x, B) is called the transition 


13.6 Chains with Arbitrary State Spaces. Ergodicity 417 


probability (or transition function) of the homogeneous Markov chain. It can be 
graphically written as 


P(x, B)= P(X, € B| Xp =x). (13.6.9) 


If the Markov chain is countable, X = {1, 2, ...}, then, in the notation of Sect. 13.1, 
one has P(, {j}) = pij = pij (1). 


The transition probability and initial distribution (of Xo) completely determine 
the joint distribution of X0,..., X, for any n. Indeed, by the total probability for- 
mula and the Markov property 


P(X € Bo, ..., Xn € Bn) 
-| f P(X € dy) P G0. d¥1)-+ PG). 
yore Bo Yn€ Bn 


(13.6.10) 


A Markov chain with the initial value Xo = x will be denoted by {X,(x)}. 

In applications, Markov chains are usually given by their conditional distribu- 
tions P(x, B) or—in a “stronger form’—by explicit formulas expressing X,+ 
in terms X,, and certain “control” elements (see Examples 13.4.2, 13.4.3, 13.6.1, 
13.6.2, 13.7.1-13.7.3) which enable one to immediately write down transition 
probabilities. In such cases, as we already mentioned, the joint distribution of 
(Xo, ..., X,) can be defined in terms of the initial distribution of Xo and the transi- 
tion function P(x, B) by formula (13.6.10). It is easily seen that the sequence {X,} 
with so defined joint distributions satisfy all the definitions of a Markov chain and 
has transition function P(x, B). In what follows, wherever it is needed, we will as- 
sume condition (13.6.10) is satisfied. It can be considered as one more definition of 
a Markov chain, but a stronger one than Definitions 13.6.2—13.6.4, for it explicitly 
gives (or uses) the transition function P(x, B). 

One of the main objects of study will be the asymptotic behaviour of the n step 
transition probability: 


P (x,n, B):=P(Xn(x) € B) = P(Xp € B|Xo =x). 


The following recursive relation, which follows from the total probability formula 
(or from (13.6.10)), holds for this function: 


P(Xn41 € B) = EE(I(Xp41 € BY) = / P(X, € dy) P(y, B), 


Poxn+1,B)= f P(.ndy) PCy. (13.6.11) 


Now note that the Markov property (13.6.3) of homogeneous chains can also be 
written in the form 


P(Xnsk € Bel ¥n) = P (Xn, k, Be), 


418 13 Sequences of Dependent Trials. Markov Chains 


or, more generally, 


P(Xn41 € Bi... Xntk € BelBn) = P(XT (Xn) € Bi, ..-, XP (Xn) € Bu), 
(13.6.12) 
where {X (a (x)} is a Markov chain independent of {X,,} and having the same tran- 
sition function as {X,,} and the initial value x. Property (13.6.12) can be extended 
to arandom time n. Recall the definition of a stopping time. 


Definition 13.6.5 A random variable v > 0 is called a Markov or stopping time with 
respect to {§,} if {v <n} € §,. In other words, that the event {v <n} occurred or 
not is completely determined by the trajectory segment Xo, X1,..., Xn. 


Note that, in Definition 13.6.5, by §, one often understands wider o-algebras, 
the essential requirements being the relations {v <n} € F, and measurability of 
X0,---, Xn with respect to Fn. 

Denote by ¥, the o-algebra of events B such that BN {v =k} € § x. In other 
words, F, can be thought of as the o-algebra generated by the sets {v = k} Bx, 
By € &x, i.e. by the trajectory of {X,,} until time v. 


Lemma 13.6.1 (The Strong Markov Property) For anyk > 1 and B,,..., Be € By, 
P(X) 41 € Bi, tee Xyik € Bil Sv) = POSS € Bi, tee x (y) € Bx), 
where the process {Xj} is defined in (13.6.12). 


Thus, after a random stopping time v, the trajectory X,41, Xy+2,... will evolve 
according to the same laws as X,, X2,..., but with the initial condition X,. This 
property is called the strong Markov property. It will be used below for the first 
hitting times v = ty of certain sets V C X by {X,,}. We have already used this 
property tacitly in Sect. 13.4, when the set V coincided with a point, which allowed 
us to cut the trajectory of {X,,} into independent cycles. 


Proof of Lemma 13.6.1 For the sake of simplicity, consider one-dimensional distri- 
butions. We have to prove that 


P(Xy41 € Bil Fy) = P(X, Bi). 


For any A € $y, 


E(P(Xy, Bi); A) = )VE(P(Xn, Bi); Av =n}) 
= )CEE(/(A{v =n}{Xn41 € Bi) |Sn) 


= DO P(A{v =n}{Xn41 € Bi}) = P(A{Xy41 € Bi}). 


n 


13.6 Chains with Arbitrary State Spaces. Ergodicity 419 


But this just means that P(X,,, B,) is the required conditional expectation. The case 
of multi-dimensional distributions is dealt with in the same way, and we leave it to 
the reader. 


Now we turn to consider the asymptotic properties of distributions P(x,n, B) as 
n> o. 


Definition 13.6.6 A distribution m(-) on (X, Bx) is called invariant if it satisfies 
the equation 


n(B)= [ x(ay)P.8). BeSBy. (13.6.13) 


It follows from (13.6.11) that if X, Ew, then X,+,; € z. The distribution z is 
also called stationary. 

For Markov chains in arbitrary state spaces X, a simple and complete classifica- 
tion similar to the one carried out for countable chains in Sect. 13.1 is not possible, 
although some notions can be extended to the general case. 

Such natural and important notions for countable chains as, say, irreducibility of 
a chain, take in the general case another form. 


Example 13.6.1 Let Xn+1 = Xn + & (mod 1) (Xy41 is the fractional part of 
Xn +&n), & be independent and identically distributed and take with positive prob- 
abilities the two values 0 and V2. In this example, the chain “splits”, according 
to the initial state x, into a continual set of “subchains” with state spaces of the 
form M, = {x + kV/2 (mod 1), k=0,1,2...}. It is evident that if x; — x2 is nota 
multiple of J2 (mod 1), then M,, and My, are disjoint, P(X; (x1) € My,) = 0 and 
P(X; (x2) € My,) = 0 for all n. Thus the chain is clearly reducible. Nevertheless, it 
turns out that the chain is ergodic in the following sense: for any x, Xn(x) & Uo, 
(P(x,n, [0, t]) > t) asn > o (see, e.g., [6], [18]). For the most commonly used 
irreducibility conditions, see Sect. 13.7. 


Definition 13.6.7 A chain is called periodic if there exist an integer d > 2 and a 
set X; C X such that, for x € X1, one has P(x,n, X,) = P(Xy,(x) € X1) = 1 for 
n=kd,k=1,2,..., and P(x,n,X,)=0forn4¢kd. 

Periodicity means that the whole set of states X is decomposed into subclasses 
X1,...,X¢, such that P(X; (x) € Xp41) = 1 for x € My, kK=1,...,d, Xa41) = Xj. 
In the absence of such a property, the chain will be called aperiodic. 

A state x9 € X is called an atom of the chain X if, for any x € X, 


[o.@) 
r( U oS «)) =1. 
n=1 

Example 13.6.2 Let Xo => 0 and, for n > 0, 


(Xn ae faa)” if Xn > 0, 


Xn+1 — eee if X, - 0, 


420 13 Sequences of Dependent Trials. Markov Chains 


where &, and 7, > 0,n = 1,2,..., are two sequences of independent random vari- 
ables, identically distributed in each sequence. It is clear that {X,,} is a Markov chain 
and, for E&; < 0, by the strong law of large numbers, this chain has an atom at the 
point x9 = 0: 


r( L {Xn -0) =P( inf Si 2 -x) =i 
n=1 


where S; = ye €;. This chain is a generalisation of the Markov chain from Ex- 
ample 13.4.3. 

Markov chains in an arbitrary state space X are rather difficult to study. However, 
if a chain has an atom, the situation may become much simpler, and the ergodic 
theorem on the asymptotic behaviour of P(x,n, B) as n > o© can be proved using 
the approaches considered in the previous sections. 


13.6.2. Markov Chains Having a Positive Atom 


Let xo be an atom of a chain {X,}. Set 
t= min{k > 0: Xz(x0) = xo}. 
This is a proper random variable (P(t < 00) = 1). 
Definition 13.6.8 The atom xo is said to be positive if Et < oo. 


In the terminology of Sect. 13.4, xo is a recurrent non-null (positive) state. 

To characterise convergence of distributions in arbitrary spaces, we will need the 
notions of the total variation distance and convergence in total variation. If P and Q 
are two distributions on (X, 8), then the total variation distance between them is 
defined by 


||P — Q\|=2 sup |P(B)— Q(B)]. 
BeBSy 


One says that a sequence of distributions P,, on (X, 8x) converges in total variation 


to P (P, —> P) if ||P, — P| > 0 as n > oo. For more details, see Sect. 3.6.2 of 
Appendix 3. 
As in Sect. 13.4, denote by P,,(k, B) the “taboo probability” 


Pyo(k, B) = P(Xx(x0) € B, X1 (x0) # X0, ---, Xe-1(X0) F XO) 
of transition from xo into B in k steps without visiting the “forbidden” state x9. 


Theorem 13.6.1 Jf the chain {Xn} has a positive atom and the g.c.d. of the possible 
values of t is 1, then the chain is ergodic in the convergence in total variation sense: 


13.6 Chains with Arbitrary State Spaces. Ergodicity 421 


there exists a unique invariant distribution 1 such that, for any x € X,asn—> o, 


| P@x,n,-)—2x()|| > 0. (13.6.14) 
Moreover, for any B € 8x, 
1 CO 
n(B) = Bee 2. Pall B). (13.6.15) 


If we denote by Xy(ft9) a Markov chain with the initial distribution [to (Xo € Mo) 
and put 


P(Mo,n, B):= P(Xn (uo) € B) = / fo(dx) P(x,n, B), 
then, as well as (13.6.14), we will also have that, as n > o0, 
|| P(uo,7,-) — x(-)|| > 0 (13.6.16) 
for any initial distribution to. 


The condition that there exists a positive atom is an analogue of conditions (1) 
and (II) of Theorem 13.4.1. A number of conditions sufficient for the finiteness of 
Et can be found in Sect. 13.7. The condition on the g.c.d. of possible values of t is 
the aperiodicity condition. 


Proof We will effectively repeat the proof of Theorem 13.4.1. First let Xo = xo. As 
in Theorem 13.4.1 (we keep the notation of that theorem), we find that 


P(xo,n, B) 


n 
= SY P(y(n) = k) P(Xn € Bl Xn—-k = X0, Xn—-k+1 FXO, +--+, Xn-1 FXO) 
k=1 


= POO =O pe > PX BIX0= 20 peer ee es 
Pr >) = ’ es 


= > PY @ =) Py (k, B). 
a, PZ) 


For the measure z defined in (13.6.15) one has 


P(xo,n, B) — 1(B) 


(PY M=h 1 i 
7 ae Be) Polk BY > Pan lk B 


k>n 


422 13 Sequences of Dependent Trials. Markov Chains 


Since P(y(n) =k) < P(t = k) and P,,(k, B) < P(t = k) (see the proof of Theo- 
rem 13.4.1), one has, for any NV, 


N 
P =k 1 
sup| P (xo,n, B) — x(B)| < >( _ D ) m=) +2 ~~ P(t >k). 
a = (13.6.17) 
Further, since 
P(y(n) = k) —> P(t >k)/Et, SOP >k)=Et <a, 
k=1 


the right-hand side of (13.6.17) can be made arbitrarily small by choosing N and 
then n. Therefore, 


lim sup|P(xo, 7, B) — «(B)| =0. 
n—-> oo B 


Now consider an arbitrary initial state x € X, x # xo. Since xo is an atom, for the 
probabilities 


FG kag) = P(N) = ie 2 Hes RE mn) 


of hitting xo for the first time on the k-th step, one has 


n 
YOFO.k x0) = 1, P(x, n, B)= D0 F(x, k, x0) P(ao,n —k, B), 
k 


k=1 
|| P(x,n,-)-—2()| 
< OF, k, x0) | P@o.n—k,-)—2()|| +2 50 FO, k, x0) > 0 
k<n/2 k>n/2 
as Nl > ©. 


Relation (13.6.16) follows from the fact that 
| Pwo.) 20] ¢ f woldsy| Pen.) = x0] +0 


by the dominated convergence theorem. 
Further, from the convergence of P(x, n, -) in total variation it follows that 


[ Ponay Po.) | xayPo.5). 


Since the left hand-side of this relation is equal to P(x,n + 1, B) by virtue of 
(13.6.11) and converges to #(B), one has (13.6.13), and hence zw is an invariant 
measure. 


13.7 Ergodicity of Harris Markov Chains 423 
Now assume that wz is another invariant distribution. Then 


m1) =P(a1,n,-)-o a), mL=X. 


The theorem is proved. 


Returning to Example 13.6.2, we show that the conditions of Theorem 13.6.1 are 
met provided that E&, <0 and En; < o. Indeed, put 


k 
n(—x):=mingk>1: Se= 0 &; s- 


j=l 


By the renewal Theorem 10.1.1, 


A(x) =En(—-x) ~ as x —>0O 


x 
|E&} | 
for Eé, < 0, and therefore there exist constants c; and c2 such that H(x) < cy +c2x 
for all x > 0. Hence, for the atom xp = 0, we obtain that 


[oe (oe) 
Er= [ P(n, € dx) H(x) < cy taf xP(n, € dx) =c; +c2En, < 00. 
0) 0 


13.7° Ergodicity of Harris Markov Chains 


13.7.1 The Ergodic Theorem 


In this section we will consider the problem of establishing ergodicity of Markov 
chains in arbitrary state spaces (X, 8x). A lot of research has been done on this 
problem, the most important advancements being associated with the names of 
W. Doblin, J.L. Doob, T.E. Harris and E. Omey. Until recently, this research area 
had been considered as a rather difficult one, and not without reason. However, the 
construction of an artificial atom suggested by K.B. Athreya, P.E. Ney and E. Num- 
melin (see, e.g. [6, 27, 29]) greatly simplified considerations and allowed the proof 
of ergodicity by reducing the general case to the special case discussed in the last 
section. 

In what follows, the notion of a “Harris chain” will play an important role. For a 
fixed set V € Sy, define the random variable 


ty (x) =min{k > 1: X;(x) eV}, 


the time of the first hitting of V by the chain starting from the state x (we put 
ty (x) = ov if all X;, (x) € V). 


424 13 Sequences of Dependent Trials. Markov Chains 


Definition 13.7.1 A Markov chain X = {X,,} in (X, By) is said to be a Harris 
chain (or Harris irreducible) if there exists a set V € SB, a probability measure pw 
on (X, Bx), and numbers no > 1, p € (0, 1) such that 


(Io) P(ty (x) < co) = 1 for all x € X; and 
(I) P(x,no, B) => pme(B) forallxe V, BE By. 


Condition (Ip) plays the role of an irreducibility condition: starting from any 
point x € X, the trajectory of X, will sooner or later visit the set V. Condition (II) 
guarantees that, after np steps since hitting V, the distribution of the walking particle 
will be minorised by a common “distribution” py(-). This condition is sometimes 
called a “mixing condition”; it ensures a “partial loss of memory” about the trajec- 
tory’s past. This is not the case for the chain from Example 13.6.1 for which con- 
dition (ID) does not hold for any V, or no (P (x, -) form a collection of mutually 
singular distributions which are singular with respect to Lebesgue measure). 

If a chain has an atom xo, then conditions (Ig) and (ID) are always satisfied for 
V = {xo}, no = 1, p = 1, and m«(-) = P(x, -), so that such a chain is a Harris chain. 

The set V is usually chosen to be a “compact” set (if X = R‘, it will be a bounded 
set), for otherwise one cannot, as a rule, obtain inequalities in (II). If the space X 
is “compact” itself (a finite or bounded subset of Ré ), condition (II) can be met 
for V = X (condition (Ip) then always holds). For example, if {X,} is a finite, ir- 
reducible and aperiodic chain, then by Theorem 13.4.2 there exists an ng such that 
P (i,no, j) = p > 0 for alli and j. Therefore condition (ID) holds for V = X if one 
takes yu to be a uniform distribution on X. 

One could interpret condition (II) as that of the presence, in all distributions 
P (x,no,-) for x € V, of a component which is absolutely continuous with respect 
to the measure wl: 


nf P (x, no, dy) . 
rev (dy) 


We will also need a condition of “positivity” (positive recurrence) of the set V 
(or that of “positivity” of the chain): 


p>O0. 


(D sup,ey Ety(x) < ~, 


and the aperiodicity condition which will be written in the following form. Let 
X,(ft) be a Markov chain with an initial value X9 € w, where mw is from condi- 
tion (II). Put 


Ty (f) := min{k >: Xz (mh) € V}. 


It is evident that ty (w) is, by virtue of (Io), a proper random variable. Denote by 
n1,n2,... the possible values of ty (2), i.e. the values for which 


P(ty()=7)>0, K=1,2,.00. 


Then the aperiodicity condition will have the following form. 


13.7 Ergodicity of Harris Markov Chains 425 
(II) There exists ak > 1 such that 
g.c.d.{no+n1,no+n2,...,Nn9o +n} =1, 


where ng is from condition (Il). 

Condition (III) is always satisfied if (ID) holds for no = 1 and pw(V) > O (then 
ny =0,no +n, = 1). 

Verifying condition (I) usually requires deriving bounds for Ety (x) for x ¢ V 
which would automatically imply (Ip) (see the examples below). 


Theorem 13.7.1 Suppose conditions (Ig), (I), (I) and (Ill) are satisfied for a 
Markov chain X, i.e. the chain is an aperiodic positive Harris chain. Then there 
exists a unique invariant distribution m such that, for any initial distribution to, as 
n> ow, 


|| P(uo, 2, -) —x(-)|| > 0. (13.7.1) 

The proof is based on the use of the above-mentioned construction of an “arti- 

ficial atom” and reduction of the problem to Theorem 13.6.1. This allows one to 

obtain, in the course of the proof, a representation for the invariant measure z simi- 
lar to (13.6.15) (see (13.7.5)). 


A remarkable fact is that the conditions of Theorem 13.7.1 are necessary for 
convergence (13.7.1) (for more details, see [6]). 


Proof of Theorem 13.7.1 For simplicity’s sake, assume that no = 1. First we will 
construct an “extended” Markov chain X* = {X7} = {Xy, w(n)}, w(n) being a se- 
quence of independent identically distributed random variables with 


P(w(n) = 1) =p, P(w(n) = 0) =1- p. 
The joint distribution of (x (n), w(n)) in the state space 
2 anon OB {0,1} = {" = G8) t4e0G 8 = 0,1} 


and the transition function P* of the chain X* are defined as follows (the notation 
X7,(x*) has the same meaning as Xp (x)): 


P(X} (x*) € (B, 5) =: P*(x*, (B, 4) = P(x, B) P(w(1) =5)  forx ¢V 


(i.e., for a ¢ V, the components of x* 4, are “chosen at random” indepen- 
dently with the respective marginal distributions). But if x € V, the distribution of 
X*(x*, 1) is given by 
P(X} ((x, 1) € (B, 5)) = P*((x, 1), (B,5)) = w(B) P(w() =8), 
P(X%((x, 0) € (B, 8) = P*((x, 0), (B, 8) = Q(x, B)P(w(1) = 8), 


426 13 Sequences of Dependent Trials. Markov Chains 


where 


Q(x, B):= (P(x, B) — pu(B))/(— p), 
so that, for any B € By, 


PR(B) + (1 — p)Q(x, B) = P(x, B). (13.7.2) 


Thus Poi + 1) = 1 X;) = p for any values of X;;. However, when “choosing” 
the value X Xn41 there occurs (only when x n€éV)a partial randomisation (or split- 
ting): for va € V, we let P(Xn+41 € B| X;,) be equal to the value 4(B) (not depend- 
ing on x n € V!) provided that w(n) = 1. If a(n) = 0, then the value of the probabil- 
ity is taken to be O(X n, B). It is evident that, by virtue of condition (ID (for mp = 1), 
H(B) and Q(x, B) are probability distributions, and by equality (13.7.2) the first 
component X,, of the process X; has the property P(X,+1 € B| Xn) = P(Xn, B), 
and therefore the distributions of the sequences X and X coincide. 

As we have already noted, the “extended” process X*(n) possesses the fol- 
lowing property: the conditional distribution P(X; 41 € (B, 9)| Xj) does not de- 
pend on X*(n) on the set X; € V* := (V, 1) and is there the known distribution 
((B)P(@(1) = 4). This just means that visits of the chain X* to the set V* divide 
the trajectory of X* into independent cycles, in the same way as it happens in the 
presence of a positive atom. 

We described above how one constructs the distribution of X* from that of X. 
Now we will give obvious relations reconstructing the distribution of X from that 
of the chain X*: 


P(X, (x) € B) = pP(X%*((x, 1) € B*) + (1 — p) P(X*(x,0) € B*), (13.7.3) 


where B* := (B,0)U(B, 1). Note also that, if we consider X, = x, as a component 
of X;, we need to write it as a function X,,(x*) of the initial value x* € X*. 
Put 


t*:=min{k>1:X¢ (x*)eV*}, x*eV*=(V,D. 


It is obvious that t* does not depend on the value x* = (x, 1), since X;(x*) has 
the distribution yu for any x € V. This property allows one to identify the set V* 
with a single point. In other words, one needs to consider one more state space X** 
which is obtained from X* if we replace the set V* = (V, 1) by a point to be denoted 
by xo. In the new state space, we construct a chain X** equivalent to X* using the 
obvious relations for the transition probability P**: 


PAS BA) HPs (Bo) tre AV Dav Ba Av", 
P** (xo, (B, 4)) = p(B), Pe a0) SP Vl 


Thus we have constructed a chain X** with the transition function P**, and this 
chain has atom xo. Clearly, t* = min{k > 1 : X;*(x0) = xo}. We now prove that this 
atom is positive. Put 

= sup Ety (x). 


xeV 


13.7 Ergodicity of Harris Markov Chains 427 


Lemma 13.7.1 Er* < aie 
Proof Consider the evolution of the first component X;(x*) of the process X;(x*), 


x* € V*. Partition the time axis k > 0 into intervals by hitting the set V by X;(x*). 


Let t; > 1 be the first such hitting time (recall that X 1 (x*) Z Xo() has the dis- 
tribution mz, so that t; = 1 if w(V) = 1). Prior to time tT, (in the case tT, > 1) 
transitions of X,(x*), k => 2, were governed by the transition function P(y, B), 
ye V°=X\ V. At time t,, according to the definition of X*, one carries out a 
Bernoulli trial independent of the past history of the process with success (which 
is the event @(t,) = 1) probability p. If w(t,) = 1 then t* = 71. If w(t,) = 0 then 
the transition from X;,(x*) to X7,41(x*) is governed by the transition function 
Q(y, B) = (PV, B) — pr(B))/C — p), y € V. The further evolution of the chain 
is similar: if t; + T2 is the time of the second visit of X (x*,k) to V (in the case 
@(t1) = 0) then in the time interval [t; + 1, t2] transitions of X (x*, k) occur accord- 
ing to the transition function P(y, B), y € V°. At time t; + T2 one carries out a new 
Bernoulli trial with the outcome w(t, + 72). If w(t; + tT) = 1, then t* = tT; + 7. 
If w(t, + tT) =0, then the transition from X (x*, tT] + T2) to X (x*, tT] + 12 + 1) is 
governed by Q(y, B), and so on. 

In other words, the evolution of the component X;(x*) of the process X{(x*) is 
as follows. Let X¥ = { Xx xk}, K=1,2,..., be a Markov chain with the distribution 
at time k = | and transition probability Q(x, B) at times k > 2, 


_ J (PG, B)— pe(B))/A— p) ifxeV, 
Sete P(x, B) ifxe VS. 


Define 7; as follows: 
Ty := 0, T, = t =min{k > 1: X, €V}, 
T= +++ +7 =min{k > T_1 : Xz € V}, i>2. 


Let, further, v be a random variable independent of X and having the geometric 
distribution 


Pw=k=(1—p)'p, kE1, veminfk>1:0(%)=1}. (13.7.4) 


Then it follows from the aforesaid that the distribution of X1(x*),..., Xr#(x*) co- 
incides with that of X,,..., X); in particular, t* = T,, and 
(oe) 
Er* = ye pd — p)* EX. 
k=1 


Further, since “4(B) < P(x, B)/p for x € V, then, for any x € V, 


Er =nvy+ ju(du)(1 + Ety (w)) 


428 13 Sequences of Dependent Trials. Markov Chains 


Ery(x) _ E 


< «| Po. V) +f P(x, du)(1 +Erv) | = 
Pp ve Pp P 


To bound Ez; for i > 2, we note that O(x, B) < (1 — py PGs B) forx € V. 
Therefore, if we denote by F(;) the o-algebra generated by {X;, w(tx)} for k < T;, 
then 


E(ti|F¢—1)) < sup] OU. v)+ f Ox, dy(1-+ Ervw)| 
xe 3 


= sup [Pe V) +f P(x, du)(1 +Ervw)| 
1— Prev ve 
= (1p)! sup Ery(x) = E(. — p)™. 


xeV 
This implies the inequality ET, < E(1/p+(k— 1)/(1 — p)), from which we obtain 
that 


Er* < E(vps p> k-)d- pr) =2E/p. 


k=1 


The lemma is proved. 


We return to the proof of the theorem. To make use of Theorem 13.6.1, we now 
have to show that P(t*(x*) < oo) = 1 for any x* € X*, where 


t*(x*) := min{k > 1: X{(x*) eV" }. 


But the chain X visits V with probability 1. After v visits to V (v was defined in 
(13.7.4)), the process X* = (X (n), w(n)) will be in the set V*. 

The aperiodicity condition for no = 1 will be met if w(V) > 0. In that case we 
obtain by virtue of Theorem 13.6.1 that there exists a unique invariant measure z* 
such that, for any x* € X*, 


1 CO 
| P*(x*,n,)-—a*Q|| 0,  —-x*((B,5)) = = \ > Pix (k, (B,d)), 
k=1 


Py«(k, (B, 5)) = P(XE(x*) € (B, 8), XT (x*) EV*,..., XE (x*) EV"). 
(13.7.5) 


In the last equality, we can take any point x* € V*; the probability does not depend 
on the choice of x* € V*. 

From this and the “inversion formula” (13.7.3) we obtain assertion (13.7.1) and 
a representation for the invariant measure z of the process X. 

The proof of the convergence ||P (ft, 1, -) — z(-)|| — O and uniqueness of the 
invariant measure is exactly the same as in Theorem 13.6.1 (these facts also follow 
from the respective assertions for X*). 


13.7 Ergodicity of Harris Markov Chains 429 


Verifying the conditions of Theorem 13.6.1 in the case where ng > | or w(V) = 0 
causes no additional difficulties and we leave it to the reader. 
The theorem is proved. 


Note that in a way similar to that in the proof of Theorem 13.4.1, one could also 
establish the uniqueness of the solution to the integral equation for the invariant 
measure (see Definition 13.6.6) in a wider class of signed finite measures. 

The main and most difficult to verify conditions of Theorem 13.7.1 are undoubt- 
edly conditions (I) and (II). Condition (Ig) is usually obtained “‘automatically’, in 
the course of verifying condition (I), for the latter requires bounding Ety (x) for 
all x. Verifying the aperiodicity condition (IID usually causes no difficulties. If, say, 
recurrence to the set V is possible in m; and m2 steps and g.c.d. (m1, m2) = 1, then 
the chain is aperiodic. 


13.7.2. On Conditions (I) and (IT) 


Now we consider in more detail the main conditions (I) and (II). Condition (ID) is 
expressed directly in terms of local characteristics of the chain (transition probabili- 
ties in one or a fixed number of steps no > 1), and in this sense it could be treated as 
a “final” one. One only needs to “guess” the most appropriate set V and measure yt 
(of course, if there are any). For example, for multi-dimensional Markov chains in 
X = R¢, condition (ID) will be satisfied if at least one of the following two conditions 
is met. 

(II) The distribution of Xy)(x) has, for some no and N > 0 and all x € Vy := 
{y : |y| < N}, acomponent which is absolutely continuous with respect to Lebesgue 
measure (or to the sum of the Lebesgue measures on R¢ and its “coordinate” sub- 
spaces) and is “uniformly” positive on the set Vy for some M > 0. In this case, one 
can take mw to be the uniform distribution on Vy. 

(ly) X = Z4 is the integer lattice in R¢. In this case the chain is countable and 
everything simplifies (see Sect. 13.4). 

We have already noted that, in the cases when a chain has a positive atom, which 
is the case in Example 13.6.2, no assumptions about the structure (smoothness) of 
the distribution of Xj, (x) are needed. 

The “positivity” condition (I) is different. It is given in terms of rather compli- 
cated characteristics Ety (x) requiring additional analysis and a search for condi- 
tions in terms of local characteristics which would ensure (I). The rest of the section 
will mostly be devoted to this task. 

First of all, we will give an “intermediate” assertion which will be useful for the 
sequel. We have already made use of such an assertion in Example 13.6.2. 


Theorem 13.7.2 Suppose there exists a nonnegative measurable function 
g:X— R such that the following conditions (I8) are met: 
(8); Ety(x) < cy + c2g(x) forx € Vo=X\ V, cy, c2 = const. 


430 13 Sequences of Dependent Trials. Markov Chains 


(I%)2 supyey Eg(X1(x)) <0. 
Then conditions (Ig) and (1) are satisfied. 


The function g from Theorem 13.7.2 is often called the test, or Lyapunov, func- 
tion. For brevity’s sake, put ty (x) := T(x). 


Proof If (1%) holds then, for x € V, 
Er(x) < 1+E[r(X1(x)); X1@) € VJ 
<1+E(E[t(X1(x))|X1(0)]; X1@) € V*) 
<1+E(c) + crg(X1(x)); Xi(x) € V*) 
<1+c1+c2 sup Eg(X1(x)) <0. 
xeV 


The theorem is proved. 


Note that condition (I*)2, like condition (II), refers to “local” characteristics of 
the system, and in that sense it can also be treated as a “final” condition (up to the 
choice of function g). 

We now consider conditions ensuring (I*);. The processes 


{Xn} = {Xn}, Xo(x) =x, 


to be considered below (for instance, in Theorem 13.7.3) do not need to be Marko- 
vian. We will only use those properties of the processes which will be stated in 
conditions of assertions. 

We will again make use of nonnegative trial functions g : X — R and consider a 
set V “induced” by the function g and a set U which in most cases will be a bounded 
interval of the real line: 


Vi=g (UU) ={x EX: g(x) €U}. 
The notation t (x) = ty (x) will retain its meaning: 
T(x) = min{k > 1: g(Xx(x)) €U} = min{k > 1: Xx(x) € VV}. 


The next assertion is an essential element of Lyapunov’s (or the test functions) 
approach to the proof of positive recurrence of a Markov chain. 


Theorem 13.7.3 If {X,,} is a Markov chain and, for x € V°, 
Eg(X1(x)) — g(x) <—«, (13.7.6) 
then Et (x) < g(x)/e and therefore (18), holds. 


To prove the theorem we need 


13.7 Ergodicity of Harris Markov Chains 431 


Lemma 13.7.2 /f, for some ¢ > 0, alln =0,1,2,...,andanyxéV°, 


E(g(Xn41) — g(Xn)|T) > n) <—e, (13.7.7) 
then 
E g(x) ‘ 
T(x) S ao? xeV’, 


and therefore (18), holds. 


Proof Put t(x) :=T for brevity and set 
TN) (= min(t, N), A(n) := g(Xn41) — g(Xn). 
We have 


—g(x) = —Eg(Xo) < E(g¢(Xxy,) — ¢(Xo)) 


T(N)— 1 
=—E >) Ata) = BAC) (¢ > n) 
n=0 n=0 
N 
=) / P(r > n)E(A@)|t > n) < ene PG =n). 
n=0 n=0 


This implies that, for any NV, 


N 
yore >n) < as 
n=0 


Therefore this inequality will also hold for N = ov, so that Et < g(x)/e. The lemma 
is proved. 


Proof of Theorem 13.7.3 The proof follows in an obvious way from the fact that, by 
(13.7.6) and the homogeneity of the chain, E(g(Xn+1) — g(Xn)| Xn) < —e holds on 
{X, € V“‘}, and from inclusion {t > n} C {X, € V‘}, so that 


E(g(Xn41) — g(Xn); t > 2) = E[E(g(Xn41) — 8(Xn)|Xn); t > 2] < —eP(t > nn). 


The theorem is proved. 


Theorem 13.7.3 is a modification of the positive recurrence criterion known as 
the Foster-Moustafa—Tweedy criterion (see, e.g., [6, 27]). 

Consider some applications of the obtained results. Let X be a Markov chain on 
the real half-axis R, = [0, 00). For brevity’s sake, put E(x) := X (x) — x. This is 
the one-step increment of the chain starting at the point x; we could also define & (x) 
as a random variable with the distribution 


P(E(x) € B)=PQ,B-x) (B—x={yeX:y+xe B}). 


432 13 Sequences of Dependent Trials. Markov Chains 
Corollary 13.7.1 If, for some N > 0 and ¢ > 0, 


sup Eé(x) < oo, = sup E&(x) < —e, (13.7.8) 


x<N x> 


then conditions (Ig) and (1) hold for V = [0, N]. 


Proof Make use of Theorems 13.7.2, 13.7.3 and Corollary 13.3.1 with g(x) =x, 
V =[0, N]. Conditions (1%) and (13.7.6) are clearly satisfied. 


Thus the presence of a “negative drift” in the region x > N guarantees positivity 
of the chain. However, that condition (I) is met could also be ensured when the 
“drift” E&(x) vanishes as x + oo. 

Corollary 13.7.2 Let sup, E&*(x) < 00 and 
Bé2(x) <f,  EBE(x)<—— forx>N. 
x 
If 2c > B then conditions (Ig) and (1) hold for V = [0, N]. 


Proof We again make use of Theorems 13.7.2 and 13.7.3, but with g(x) = x7. We 
have for x > N: 


Eg(X1(x)) — g(x) = E(2xé(x) + €7(x)) < —2c + B <0. 


Before proceeding to examples related to ergodicity we note the following. The 
“larger” the set V the easier it is to verify condition (I), and the “smaller” that set, 
the easier it is to verify condition (II). In this connection there arises the question 
of when one can consider two sets: a “small” set W and a “large” set V > W such 
that if (1) holds for V and (II) holds for W then both (1) and (II) would hold for W. 
Under conditions of Sect. 13.6 one can take W to be a “one-point” atom xo. 


Lemma 13.7.3 Let sets V and W be such that the condition 


dy) E:=supEty(x) <@ 
xeV 


holds and there exists an m such that 


ing LU {xi@me m1) >q>0. 


j=l 


Then the following condition is also met: 


E 
(wy) sup Ety(x) < sup Ety(x) < ——. 
xEeW xeV q 


13.7 Ergodicity of Harris Markov Chains 433 


Thus, under the assumptions of Lemma 13.7.3, if condition (I) holds for V and 
condition (I) holds for W, then conditions (I) and (ID hold for W. 

To prove Lemma 13.7.3, we will need the following assertion extending (in the 
form of an inequality) the well-known Wald identity. 

Assume we are given a sequence of nonnegative random variables 1, T2,... 
which are measurable with respect to o-algebras Ll; C Mp C ---, respectively, and 
let J, := 7) +---+ ,. Furthermore, let v be a given stopping time with respect to 
{Un}: {v <n} e Un. 


Lemma 13.7.4 [f E(t,|Un—1) <a then ET, <aEv. 


Proof We can assume without loss of generality that Ev < oo (otherwise the in- 
equality is trivial). The proof essentially repeats that of Theorem 4.4.1. One has 


CO CO 
Er, =) E(k: v=k)=) E(K,v>h). (13.7.9) 
k=1 k=1 


Changing the summation order here is well-justified, for the summands are nonneg- 
ative. Further, {v < k — 1} € L&_, and hence {v > k} € L,_1. Therefore 


E(u; v > k) = El(v > K)E(t,|Uk_1) < aP(v > k). 


Comparing this with (13.7.9) we get 


[o.@) 
ET, <a) P(v>k) =aEnu. 
k=1 


The lemma is proved. 


Proof of Lemma 13.7.3 Suppose the chain starts at a point x € V. Consider the 
times 7), T2,... of successive visits of X to V, To = 0. Put Yo := x, Ya := X7, (x), 
k=1,2,.... Then, by virtue of the strong Markov property, the sequence (Yx, Tx) 
will form a Markov chain. Set £4 := 0(7),..., Th; Y1,.--, Ye), T = Th — Th-1, 
k=1,2.... Then v:=min{k : Yg € W} is a stopping time with respect to {LU}. It is 
evident that E(t,|Lu_1) < E. Bound Ev. We have 


Tem 
Pk = P(v > km) < r( ales: w) 


j=l 


Tk-1)m Tkm 
= Bi () {Xj ¢ wi }e(i( () {X; ¢ w) bur) 


j=l J=T(k-1)m+1 


434 13 Sequences of Dependent Trials. Markov Chains 


Since t; = 1, the last factor, by the assumptions of the lemma and the strong 
Markov property, does not exceed 


r( al 1a tina) ¢ w) <(-4q), 


j=l 


where, as before, X a (x) is a chain with the same distribution as X;(x) but in- 
dependent of the latter chain. Thus py < (1 — g)pe-1 < (1 — q)*, Ev < m/q, and 
by Lemma 13.7.4 we have ET, < E,,/q. It remains to notice that tw (x) = T,. The 
lemma is proved. 


Example 13.7.1 A random walk with reflection. Let &1, 2, ... be independent iden- 
tically distributed random variables, 


Xngi:=|Xnt&qil, n=0,1,.... (13.7.10) 


If the &% and hence the Xx are non-arithmetic, then the chain X has, generally 
speaking, no atoms. If, for instance, & have a density f(t) with respect to Lebesgue 
measure then P(X; (x) = y) = 0 for any x, y, k > 1. We will assume that a broader 
condition (A) holds: 

(A). In the decomposition 


PE <th= Pa Fy(t) + Pc F(t) 


of the distribution of &% into the absolutely continuous (Fq) and singular (F;) (in- 
cluding discrete) components, one has Pa > 0. 


Corollary 13.7.3 If condition (A) holds, a = Fé < 0, and E|\&| < 00, then the 
Markov chain defined in (13.7.10) satisfies the conditions of Theorem 13.7.2 and 
therefore is ergodic in the sense of convergence in total variation. 


Proof We first verify that the chain satisfies the conditions of Corollary 13.7.1. 
Since in our case |X,(x) — x| < |&|, the first of conditions (13.7.8) is satisfied. 
Further, 


Eg(x) = Elx + &1| —x = E(é1; 1 = —x) — E(x + &1; 1 < —x) > ES 
as xX — oO, since 


xP(E, <—x) < E(l&1|, lé1| > x) > 0. 


Hence there exists an N such that E&(x) < a/2 < 0 for x > N. This proves that 
conditions (Ig) and (1) hold for V = [0, NJ]. 

Now verify that condition (II) holds for the set W = [0, h] with some h. Let f(t) 
be the density of the distribution F, from condition (A). There exist an fp > 0 and 
a segment [f1, fo], t2 > t1, such that f(t) > fo for t € [t1, to]. The density of x + & 


13.7 Ergodicity of Harris Markov Chains 435 


will clearly be greater than fo on [x +%,x +f]. Put A := (t — t,)/2. Then, for 
0 <x <h, one will have [tg —h, t2] C[Ix +t,x+bh]. 

Suppose first that ty > 0. The aforesaid will then mean that the density of x + & 
will be greater than fg on [(t2 — h)*, t)] for all x < h and, therefore, 


inf P(X (x) € B) > pi i fo(t) dt, 
x< B 


where 


0 otherwise. 


f= | 


This means that condition (ID is satisfied on the set W = [0, h]. The case t2 < 0 can 
be considered in a similar way. 

It remains to make use of Lemma 13.7.3 which implies that condition (I) will 
hold for the set W. The condition of Lemma 13.7.3 is clearly satisfied (for suffi- 
ciently large m, the distribution of X,,(x), x < N, will have an absolutely continu- 
ous component which is positive on W). For the same reason, the chain X cannot be 
periodic. Thus all conditions of Theorem 13.7.2 are met. The corollary is proved. 


Example 13.7.2 An oscillating random walk. Suppose we are given two indepen- 
dent sequences &, &,... and 71, 72,... of independent random variables, identi- 
cally distributed in each of the sequences. Put 


x -_ Xn t En+1 if Xn= 0, (13 7 11) 
| Se eee GER 0. a“ 


Such a random walk is called oscillating. It clearly forms a Markov chain in the 
state space X = (—oo, 00). 


Corollary 13.7.4 If at least one of the distributions of & or nx satisfies condition 
(A) and —oo < E& <0, 00 > Eng > 0, then the chain (13.7.11) will satisfy the 
conditions of Theorem 13.7.2 and therefore will be ergodic. 


Proof The argument is quite similar to the proof of Corollary 13.7.3. One just needs 
to take, in order to verify condition (I), g(x) = |x| and V = [—N, N]. After that it 
remains to make use of Lemma 13.7.3 with W = [0, h] if condition (A) is satisfied 
for & (and with W = [—A, 0) if itis met for nx). 


Note that condition (A) in Examples 13.7.1 and 13.7.2 can be relaxed to that of 
the existence of an absolutely continuous component for the distribution of the sum 
ie &j (or DO") nj) for some m. On the other hand, if the distributions of these 
sums are singular for all m, then convergence of distributions P(x, 1, -) in total vari- 
ation cannot take place. If, for instance, one has P(éj = af) = P(& =1)=1/2in 
Example 13.7.1, then E&% < 0 and condition (I) will be met, while condition (ID) will 


436 13 Sequences of Dependent Trials. Markov Chains 


not. Convergence of P(x, n, -) in total variation to the limiting distribution w is also 
impossible. Indeed, it follows from the equation for the invariant distribution a that 
this distribution is necessarily continuous. On the other hand, say, the distributions 
P(0,n,-) are concentrated on the countable set N of the numbers | — k/2 + J|; 
k,l =1,2,.... Therefore P(0,n, N) = 1 for all n, 7(N) = 0. Hence only weak 
convergence of the distributions P(x, n, -) to m(-) may take place. And although this 
convergence does not raise any doubts, we know no reasonably simple proof of this 
fact. 


Example 13.7.3 (continuation of Examples 13.4.2 and 13.6.1) Let X = [0, 1], 
&|,&,... be independent and identically distributed, and Xyn+41 := Xn + €n41 
(mod 1) or, which is the same, Xn+1 := {Xn + 41}, where {x} denotes the frac- 
tional part of x. Here, condition (I) is clearly met for V = X = [0, 1]. If the & satisfy 
condition (A) then, as was the case in Example 13.7.1, condition (II) will be met for 
the set W = [0, h] with some h > 0, which, together with Lemma 13.7.3, will mean, 
as before, that the conditions of Theorem 13.7.2 are satisfied. The invariant distri- 
bution z will in this example be uniform on [0, 1]. For simplicity’s sake, we can 
assume that the distribution of & has a density f(t), and without loss of generality 
we can suppose that &; € [0, 1] (f(t) =0 for t ¢ [0, 1]). Then the density p(x) = 1 
of the invariant measure z will satisfy the equation for the invariant measure: 


x 1 1 
psy=1= f ay for—y)+ f dy fa-y+D=f f(y) dy. 


Since the stationary distribution is unique, one has m = Uo,1. Moreover, by The- 
orem A3.4.1 of Appendix 3, along with convergence of P(x,n,-) to Uo,1 in total 
variation, convergence of the densities P(x,n,dt)/dt to 1 in (Lebesgue) measure 
will take place. 

The fact that the invariant distribution is uniform remains true for arbitrary 
non-lattice distributions of &. However, as we have already mentioned in Exam- 
ple 13.6.1, in the general case (without condition (A)) only weak convergence of 
the distributions P(x, n, -) to the uniform distribution is possible (see [6, 18]). 


13.8 Laws of Large Numbers and the Central Limit Theorem for 
Sums of Random Variables Defined on a Markov Chain 


13.8.1 Random Variables Defined on a Markov Chain 


Let, as before, X = {X,} be a Markov Chain in an arbitrary measurable state space 
(¥, 8x) defined in Sect. 13.6, and let a measurable function f: ¥ — R be given 
on (1, 8x). The sequence of sums 


S, i= S> f (Xe) (13.8.1) 
k=1 


13.8 Limit Theorems for Sums of Random Variables 437 


is a generalisation of the random walks that were studied in Chaps. 8 and 11. One 
can consider an even more general problem on the behaviour of sums of random 
variables defined on a Markov chain. Namely, we will assume that a collection 
of distributions {F,.} is given which depend on the parameter x € ¥. If F, (—D (yy 
is the quantile transform of F, and w € Uo1, then & := F, Dig) will have the 
distribution F,. (see Sect. 3.2.4). 

The mapping F;,, of the space V into the set of distributions is assumed to be such 
that the function &,.(t) = F, (—D (yy is measurable on V x R with respect to By x B, 
where 8 is the o-algebra of Borel sets on the real line. In this case, &,(w) will be a 
random variable such that the moments 


oo 1 
pes f v'ar.oy= f [AP waa 
=00 0 


are measurable with respect to 8 y (and hence will be random variables themselves 
if we set a distribution on (V, 8 y)). 


Definition 13.8.1 If @; @ Up,; are independent then the sequence 


Ex, = Fy, (@n), n=0,1,..., 


is called a sequence of random variables defined on the Markov chain {Xn}. 
The basic objects of study in this section are the asymptotic properties of the 
distributions of the sums 


eas (13.8.2) 
k=0 


If the distribution F, is degenerate and concentrated at the point f(x) then 
(13.8.2) turns into the sum (13.8.1). If the chain X is countable with states 
Eo, E\,... and f(x) =I(£;) then S, = m;(7) is the number of visits to the state 
E;; by the time n considered in Theorem 13.4.4. 


13.8.2 Laws of Large Numbers 


In this and the next subsection we will confine ourselves to Markov chains satis- 
fying the ergodicity conditions from Sects. 13.6 and 13.7. As was already noticed, 
ergodicity conditions for Harris chains mean, in essence, the existence of a positive 
atom (possibly in the extended state space). Therefore, for the sake of simplicity, we 
will assume from the outset that the chain X has a positive atom at a point xo and 
put, as before, 


T(x) = min{k >0: Xx(x) = xo}, T(x0) =T. 


Summing up the conditions sufficient for (Ig) and (J) to hold (the finiteness of t (x) 
and Er) studied in Sect. 13.7, we obtain the following assertion in our case. 


438 13 Sequences of Dependent Trials. Markov Chains 


Corollary 13.8.1 Let there exist a set V € Bx such that, for the stopping time 
Ty (x) := min{k : Xz(x) € V}, we have 


E := sup Ety (x) < ©. (13.8.3) 


xeV 


Furthermore, let there exist an m > | such that 


m 
wer( Ul {Xj (x) = 20 ) >q>0. 


Then 


mE 
Er < —. 
q 


This assertion follows from Lemma 13.7.2. One can justify conditions (Ig) and 
(13.8.3) by the following assertion. 


Corollary 13.8.2 Let there exist an ¢ > 0 and a nonnegative measurable function 
g:&X —R such that 


sup Eg (Xi (x)) <0o 


xeV 
and, forx € V°, 
Eg(X1(x)) — g(x) < —e. 
Then conditions (Ig) and (13.8.3) are met. 


In order to formulate and prove the law of large numbers for the sums (13.8.2), we 
will use the notion of the increment of the sums (13.8.2) on a cycle between conse- 
quent visits of the chain to the atom xq. Divide the trajectory Xo, X1, X2,..., Xn of 
the chain X on the time interval [0, 1] into segments of lengths tT; := T(x), T2, 73,... 


(Tj £ ¢ for j = 2) corresponding to the visits of the chain to the atom xo. Denote 
the increment of the sum S,, on the k-th cycle (on (Ty_1, Ti ]) by &: 


T 
bii= > x 
j=0 


Y> éx;,k=2, where i=) tj, k= 1, T=0. 
J=T-14+1 


(13.8.4) 


The vectors (T%, €%), k => 2, are clearly independent and identically distributed. For 


brevity, the index k will sometimes be omitted: (T,, f%) = (t,€) fork > 2. 
Now we can state the law of large numbers for the sums (13.8.2). 


13.8 Limit Theorems for Sums of Random Variables 439 


Theorem 13.8.1 Let P(t (x) < co) = 1 forall x, Et < 00, E|C| < co, and the g.c.d. 
of all possible values of t equal 1. Then 


Se 1 E 
{= = as n>o. 
k=1 
Proof Put 
v(n) := max{k: Ty <n}. 


Then the sum S,, can be represented as 
Sn =Or+ Zy(n) + Zn, (13.8.5) 


where 


k n 
fea) Ge a= >. Sy 
j=2 


J=Ty(n) +1 


Since t; and ¢; are proper random variables, we have, as n + 00, 


eng, (13.8.6) 


The sum Z, consists of y(n) :=n — T,(n) summands. Theorem 10.3.1 implies that 
the distribution of y (7) converges to a proper limiting distribution, and the same is 
true for z,. Hence, as n > oo, 

is 

ee (13.8.7) 


n 


The sums Z, ,), being the main part of (13.8.5), are nothing else but a generalised 
renewal process corresponding to the vectors (tT, €) (see Sect. 10.6). 
Since Et < oo, by Theorem 11.5.2, as n > ov, 


Zv(n) es Ee 


13.8.8 
n Et ( ) 
Together with (13.8.6) and (13.8.7) this means that 

S E 

Sn pe (13.8.9) 

n Et 


The theorem is proved. 


As was already noted, sufficient conditions for P(t (x) < 00) = 1 and Et < o to 
hold are contained in Corollaries 13.8.1 and 13.8.2. It is more difficult to find con- 
ditions sufficient for Ef < oo that would be adequate for the nature of the problem. 

Below we will obtain certain relations which clarify, to some extent, the con- 
nection between the distributions of ¢ and t and the stationary distribution of the 
chain X. 


440 13 Sequences of Dependent Trials. Markov Chains 


Theorem 13.8.2 (A generalisation of the Wald identity) Assume Et < ov, the g.c.d. 
of all possible values of t be 1, 1 be the stationary distribution of the chain X, and 


E,Elé+| = | Elé;|n(dx) < 00. (1.8.10) 


Then 
E¢ = EtE, Eé,. (13.8.11) 


The value of E,, E&, is the “doubly averaged” value of the random variable é,: 
over the distribution F, and over the stationary distribution z. 

Theorem 13.8.2 implies that the condition sup, E|&,| < oo is sufficient for the 
finiteness of E|¢|. 


Proof [of Theorem 13.8.2] First of all, we show that condition (13.8.10) implies 
the finiteness of E|¢|. If &. > 0 then E¢ is always well-defined. If we assume that 


E¢ = oo then, repeating the proof of Theorem 13.8.1, we would easily obtain that, 


fa ih Pp : 
in this case, S,/n — ov, and hence necessarily ES, /n — oo as n > oo. But 


n n 
ES, =) Eéx,=)> / (EE, P(X; € dx), 
j=0 j=0 
where the distribution P(X ; € -) converges in total variation to m(-) as j > 00, 


; (EE,)P(X; € dx) > | (Eé,)x (dx), 
and hence 
las, > E, EE, <0o. (13.8.12) 
n 


This contradicts the above assumption, and therefore Ef < oo. Applying the above 
argument to the random variables |&,|, we conclude that condition (13.8.10) implies 
E|f| < 00. 


Let, as above, n(n) := v(n) + 1 = min{k : T, > n}. We will need the following. 
Lemma 13.8.5 If E|¢| < co then 
Egn~m) = o(n). (13.8.13) 


If Et? <0 then 
Ein) =O) (13.8.14) 


asn —+ ©. 


13.8 Limit Theorems for Sums of Random Variables 441 


Proof Without losing generality, assume that &, > 0 and ¢ = 0. Since t; => 1, we 
have 
k 
h(k) := yee =k) <1 forallk. 
j=0 


Therefore, 


Pirin) > ¥) = YAP > v,t >n-kK) < OP(O >, tb). 
k=0 k=0 


If EZ < oo then 
n oo n 
Ein) < a P(¢>v;t>k)dv= YEG: t>k), (13.8.15) 
k=079 k=0 


where E(¢;t > k) — 0 as k —> oo. This follows from Lemma A3.2.3 of Ap- 
pendix 3. Together with (13.8.15) this proves (13.8.13). 
Similarly, for Ec? <0, 


Etim £2). | vP(¢ >v,t >k)dv= J E(¢*,t > k) =o(n). 
k=0 k=0 


The lemma is proved. 


Now we continue the proof of Theorem 13.8.2. Consider representation (13.8.5) 
for Xo = xo and assume again that €, > 0. Then ¢; = &x), 


Sn=oi+ Zn(n) + in = Cn(n)> 
where by the Wald identity 


E¢ 
EZ) = En(@n)Ee ~ a 


Since m({xo}) = 1/Et > 0, we have, by (13.8.10), E|&,.| < oo. Moreover, for 
&, = 0, 
Inn) —Zn| < Cn(n)- 
Hence, by Lemma 13.8.5, 
E¢ 
ES, =n — + o0(n). (13.8.16) 
Er 


Combining this with (13.8.12), we obtain the assertion of the theorem. 
It remains to consider the case where &, can take values of both signs. Introduce 
new random variables €* on the chain X, defined by the equalities €* := |&,|, and 


442 13 Sequences of Dependent Trials. Markov Chains 


endow with the superscript * all already used notations that will correspond to the 
new random variables. Since all € > 0, by condition (13.8.10) we can apply to them 
all the above assertions and, in particular, obtain that 


Ect* <a, Eon) =o(n). (13.8.17) 
Since 


ra =o ae ISncn)| ene ISn(n) — Zn! = Cia 
it follows from (13.8.17) that 


E|¢| < OW, Elon n) — Zn| =o(n) 


and relation (13.8.16) is valid along with identity (13.8.11). 
The theorem is proved. 


Now we will prove the strong law of large numbers. 


Theorem 13.8.3 Let the conditions of Theorem 13.8.1 be satisfied. Then 


Sn a.s. 
——>E,E& asnow. 


Proof Since in representation (13.8.5) one has ¢)/n **. 0 as n > 00, we can ne- 
glect this term in (13.8.5). 

The strong laws of large numbers for {Z;} and {7} mean that, for a given e > 0, 
the trajectory of {S7,} will lie within the boundaries kKE¢(1 = €) and re TC + 2) 
for all k > n and n large enough. (We leave a more formal formulation of this to the 
reader.) 

We will prove the theorem if we verify that the probability of the event that, be- 
tween the times T;, k > n, the trajectory of S; will cross at least once the boundaries 


E 
rj (1 + 3e), where r = = tends to zero as n — oo. Since 
T 


max |S; —S7,1<¢ (13.8.18) 


Tr-1<j<Tk 


(in the notation of the proof of Theorem 13.8.1), it is sufficient to verify that 
P(A,) > 0 as n — ov, where Ay := Gem fe > erT;}. But 


P(A;) = P(An Bn) + P(An Bn), (13.8.19) 


where 
(oe) 
Br=(){T%e>kEtU-2)}, PB») >0 asn>oo, 


k=n 


13.8 Limit Theorems for Sums of Random Variables 443 


so the second summand in (13.8.19) tends to zero. The first summand on the right- 
hand side of (13.8.19) does not exceed (for c = e(1 — €)EZ) 


e( Ute > eECK(1 -o) < > P(e >ck) > 0 


k=n k=n 


as n —> 00, since E¢* < oo (see (13.8.17)). The theorem is proved. 


13.8.3 The Central Limit Theorem 


As in Theorem 13.8.1, first we will prove the main assertion under certain condi- 
tions on the moments of ¢ and Tt, and then we will establish a connection of these 
conditions to the stationary distribution of the chain X. Below we retain the notation 
of the previous section. 


Theorem 13.8.4 Let P(t(x) < 00) = 1 for any x, Et? < 00, the g.c.d. of all possi- 
ble values of t is 1, and Ec? < oo. Then, asn > o, 

S,— 

eae Se ®o 1, 


d./n/a 


where r := a; /a, ag := EC, a:= Et and d* :=D(¢ —rrt). 
Proof We again make use of representation (13.8.5), where clearly 


fl p fn P 

—— > 0, —--—0 

Jn Jn 
(see the proof of Theorem 13.8.1). This means that the problem reduces to that of 
finding the limiting distribution of Z)~) = Zy(n) — Sn~m), Where by Lemma 10.6.1 
n(n) has a proper limiting distribution, and so fp(n)/./n +. 0 as n > oo. Further- 
more, by Theorem 10.6.3, 


where Ge — a 'D(c —rt)r= mY The theorem is proved. 


Now we will establish relations between the moment characteristics used for 
normalising S, and the stationary distribution 2. The answer for the number r was 
given in Theorem 13.8.2: r = E, Eé,. For the number ae we have the following 
result. 


444 13 Sequences of Dependent Trials. Markov Chains 


Theorem 13.8.5 Let 
[o,@) 
oF i= / Dé, x (dx) +29) E(Ex, —r) Ex; 17) 
j=l 
be well-defined and finite, where Xy) € x. Then 
2 


Ge :=a7'q* =0a?. 


Note that here the expectation under the sum sign is a “triple averaging”: over 
the distribution 2 (dy)P(y, j, dz) and the distributions of §, and €,. 


Proof We have 


i 2 
E(S, —rn)? = el - o| 


k=0 


=) Ex, —7)? +2 Ex, —r) Ex; — 17), (13.8.20) 


k=0 k<j 
where 
YI EEx, — 7) = DO EE, — Eéx,)° + Dex, — 7)”. (13.8.21) 
k=0 k=0 k=0 


The summands in the first sum on the right-hand side of (13.8.21) converge to og = 


f Dé, (dx), the summands in the second sum converging to zero. Therefore, the 
left-hand side of (13.8.21) is asymptotically equivalent to nop. 
Further, 


Y EEX, —N(—x, -N =>) Yo Ex, —N Ex; -1), (13.8.22) 


k<j k=0 j>k+1 


where the distribution of X; converges in total variation to the stationary distribu- 
tion a of the chain. Hence the inner sums on the right-hand side of (13.8.22), for 
large k and n — k (say, for ./n < k <n — ./n when n > 0), will be close to 


E:=) Ex, —r)(éx,;—7), 


j=l 


where Xo = mw and the whole sum on the right-hand side of (13.8.22) is asymptoti- 
cally equivalent, as n > ov, tonE (or will be o(n) if E = 0). 
Thus 


1 
-E(S, — rn)? ~ of +2E. (13.8.23) 
n 


13.8 Limit Theorems for Sums of Random Variables 445 


We now show that the existence of Of and E implies the finiteness of d? = E(¢ — 
rt)’. 
Consider the truncated random variables 


&y if &, e[-N, N], 
EN: 3 N  if& >N, 
-—N iff <-—N. 


Since og < 00, we have Eé? < 00 (a.e. with respect to the measure wz) and 


(Mey (of)? + of, E™)_, E asN-> 00, 


where the superscript (V) means that the notation corresponds to the truncated ran- 
dom variables. By virtue of Theorem 13.8.4, 


ened as 
liminf — E(S,") — r©”) — 1(a))?, 


If we assume that d = oo then we will get that the lim inf on the left-hand side of this 


relation is infinite. But this contradicts relation (13.8.23), by which the above lim inf 


equals (ee )? + 2E™) and remains bounded. We have obtained a contradiction, 


which shows that d < oo. 
On the other hand, for d < oo, Ec? < oo and, for the initial value xo, by (13.8.5) 
we have 


E(S, — rn)? =E(Zyqy) + Zn — 10)? 
= E(Zy1) = rn) a 2E(Zyn) —rn)(Zn — Snn)) + E(Zn — Gin 
(13.8.24) 


where n = T,(n) — x (n). Therefore, putting Y, := Zy, — rT, = ae (fh — rt), we 
obtain 


E(Zy(n) — 10)” = EY) qy — 2EY nny x (2) + Ex7(n). 


By virtue of (10.4.7), Ex?(n) = o(n). By (10.6.4) (with a somewhat different nota- 
tion), 


EY, (,) =@ En(n), 


where d* := D(¢ — rt), En(n) ~ n/a and a = Et. Hence, applying the Cauchy— 
Bunjakovsky inequality, we get 


[EY x(n)| = 0(n), E(Zj(n) — rn)? ~nd?a"!. (13.8.25) 
It remains to estimate the last two terms on the right-hand side of (13.8.24). But 


fare = Zn < betas 


446 13 Sequences of Dependent Trials. Markov Chains 


where ¢* corresponds to the summands EX, = |&x,| and where, by Lemma 13.8.5 
applied to €* = |,|, we have 


E(6y)) =0(n). 


Therefore E(f,(n) — Zn) = o(n) and, by the Cauchy—Bunjakovsky inequality and 
relation (13.8.25), the same relation is valid for the shifted moment in (13.8.24). 
Thus, 


E(S, —rn)* ~ a7!d?n. 


Combining this relation with (13.8.23), we obtain the assertion of the theorem. 


Chapter 14 
Information and Entropy 


Abstract Section 14.1 presents the definitions and key properties of information 
and entropy. Section 14.2 discusses the entropy of a (stationary) finite Markov chain. 
The Law of Large Numbers is proved for the amount of information contained in 
a message that is a long sequence of successive states of a Markov chain, and the 
asymptotic behaviour of the number of the most common states in a sequence of 
successive values of the chain is established. Applications of this result to coding 
are discussed. 


14.1 The Definitions and Properties of Information and Entropy 


Suppose one conducts an experiment whose outcome is not predetermined. The 
term “experiment” will have a broad meaning. It may be a test of a new device, a 
satellite launch, a football match, a referendum and so on. If, in a football match, 
the first team is stronger than the second, then the occurrence of the event A that the 
first team won carries little significant information. On the contrary, the occurrence 
of the complementary event A contains a lot of information. The event B that a 
leading player of the first team was injured does contain information concerning the 
event A. But if it was the first team’s doctor who was injured then that would hardly 
affect the match outcome, so such an event B carries no significant information 
about the event A. 

The following quantitative measure of information is conventionally adopted. Let 
A and B be events from some probability space (2, ¥, P). 


Definition 14.1.1 The amount of information about the event A contained in the 
event (message) B is the quantity 
P(A|B) 


The notions of the “amount of information” and “entropy” were introduced by C.E. Shannon in 
1948. For some special situations the notion of amount of information had also been considered in 
earlier papers (e.g., by R.V.L. Hartley, 1928). The exposition in Sect. 14.2 of this chapter is 
substantially based on the paper of A. Ya. Khinchin [21]. 


A.A. Borovkov, Probability Theory, Universitext, 447 
DOI 10.1007/978-1-4471-5201-9_14, © Springer-Verlag London 2013 


448 14 Information and Entropy 


The occurrence of the event B = A may be interpreted as the message that A 
took place. 


Definition 14.1.2 The number /(A) := /(A|A) is called the amount of information 
contained in the message A: 


I(A) := (AJA) = — log P(A). 


We see from this definition that the larger the probability of the event A, the 
smaller /(A). As a rule, the logarithm to the base 2 is used in the definition of infor- 
mation. Thus, say, the message that a boy (or girl) was born in a family carries a unit 
of information (it is supposed that these events are equiprobable, and — logy p = 1 
for p = 1/2). Throughout this chapter, we will write just log x for log, x. 

If the events A and B are independent, then /(A|B) = 0. This means that the 
event B does not carry any information about A, and vice versa. It is worth noting 
that we always have 


I(A|B) = 1(B|A). 
It is easy to see that if the events A and B are independent, then 
I(AB)=1(A)+1(B). (14.1.1) 


Consider an example. Let a chessman be placed at random on one of the squares 
of a chessboard. The information that the chessman is on square number k (the 
event A) is equal to 7(A) = log 64 = 6. Let B, be the event that the chessman is in 
the i-th row, and B> that the chessman is in the j-th column. The message A can be 
transmitted by transmitting B, first and then By. We have 


I(B,) =log8 =3 =1(B2). 
Therefore 
1(Bi) + 1(B2) =6=1(A), 


so that transmitting the message A “by parts” requires communicating the same 
amount of information (which is equal to 6) as transmitting A itself. One could 
give other examples showing that the introduced numerical characteristics are quite 
natural. 

Let G be an experiment with outcomes £),..., Ey occurring with probabilities 
P1,---,PN- 

The information resulting from the experiment G is a random variable 
Jc = Jg(@) assuming the value — log p; on the set Ej, j=1,...,N. 

Thus, if in the probability space (2, 5, P) corresponding to the experiment G, 
2 coincides with the set (F1,..., En), then Jg(w) = /(). 


Definition 14.1.3. The expectation of the information obtained in the experiment G, 
EJ¢ = — >> p; log p;, is called the entropy of the experiment. We shall denote it by 


N 
Hp = H(G):=— © pjlog pj, 
j=l 


14.1. The Definitions and Properties of Information and Entropy 449 


Fig. 14.1 The plot of the 
entropy f(p) of arandom 
experiment with two 


outcomes 

0 1 
where p= (p1,..-, Pn). For pj =0, by continuity we set p; log p; to be equal to 
zero. 


The entropy of an experiment is, in a sense, a measure of its uncertainty. Let, 
for example, our experiment have two outcomes A and B with probabilities p and 
q = 1- pp, respectively. The entropy of the experiment is equal to 


Hy = —plog p — (1 — p)log(1 — p) = f(p). 


The graph of this function is depicted in Fig. 14.1. 

The only maximum of f(p) equals log 2 = | and is attained at the point p = 1/2. 
This is the case of maximum uncertainty. If p decreases, then the uncertainty also 
decreases together with Hp, and Hp = 0 for p = (0, 1) or (0, 1). 

The same properties can easily be seen in the general case as well. 


The properties of entropy. 
1. H(G) =0 ifand only if there exists a j, 1 < j < N, such that pj =P(E;) = 1. 
2. H(G) attains its maximum when p; =1/N for all j. 


Proof The second derivative of the function 6(x) = x logx is positive on [0, 1], 
so that B(x) is convex. Therefore, for any g; > 0 such that ae gi = 1, and any 
x; => 0, one has the inequality 


N N 
(doa) <0 aiBGi). 
i=l i=l 


If we take gj = 1/N, x; = p;, then 
i< es ae 
(5 2») we( 5; ”) = dy? log pi- 
Setting u:= (w: er w) we obtain from this that 
N 


1 
— log =logN = Hy > — > | pilog pi = Hp. 


i=l 


Note that if the entropy H(G) equals its maximum value H(G) = log N, then 
Jc (@) = log N with probability 1, i.e. the information Jg(@) becomes constant. 


450 14 Information and Entropy 


3. Let G; and G2 be two independent experiments. We write down the outcomes 
and their probabilities in these experiments in the following way: 


eee, (a) 
— ’ G2= . 
Pi,--+,PN q1,--+,94M 


Combining the outcomes of these two experiments we obtain a new experiment 


E\ Aj, E\A2,...,EnAm 
P1Q1, P192,+++»PN9M )° 


The information Jg obtained as a result of this experiment is a random variable 
taking values — log p;q; with probabilities pjqgj,i=1,...,N; j=1,...,M. But 
the sum JG, + JG, of two independent random variables equal to the amounts of 
information obtained in the experiments G; and G2, respectively, clearly has the 
same distribution. Thus the information obtained in a sequence of independent ex- 
periments is equal to the sum of the information from these experiments. Since in 
that case clearly 


G= 6x 62=( 


EJg =EJ¢, + EJc,, 


we have that for independent G, and G2 the entropy of the experiment G is equal 
to the sum of the entropies of the experiments G, and G2: 


H(G) = H(G) + H(G2). 


4. If the experiments G; and G2 are dependent, then the experiment G can be 
represented as 


G= E\ Aj, E\A2,...,EnAm 
7 G11, 9125+--,9NM 


with qij = pipij, where pj;; is the conditional probability of the event A; 
given Ej, so that 


M 
\ gi =pi=P(E), i=1,...,N; 
j=l 


apg HEAD. Gala. 

j=l 
In this case the equality Jc = Jc, + JG,, generally speaking, does not hold. In- 
troduce a random variable J which is equal to —log p;j on the set E;A;. Then 
evidently Jg = Jc, + J;. Since 

P(A|E;) = pij. 

the quantity J; for a fixed 7 can be considered as the information from the experi- 
ment G» given the event E; occurred. We will call the quantity 


E(J;|Ei) =- Yen 


14.1 The Definitions and Properties of Information and Entropy 451 


the conditional entropy H(G2|E},) of the experiment G2 given E;, and the quantity 
EJ} =— ) > qij log pij = )~ pi (Gal E1) 
i,j i 
the conditional entropy H(G2|G1) of the experiment G2 given G}. In this notation, 
we obviously have 


A(G) = H(G1) + H(G2|G1). 
We will prove that in this equality we always have 
H(G2|G1) < H(G2), 
i.e. for two experiments G, and G2 the entropy H(G) never exceeds the sum of the 
entropies H(G,) and H(G2): 
H(G) = H(G| x G2) < H(G1) + H(G2). 


Equality takes place here only when qij = piq;, i.e. when G, and G2 are indepen- 
dent. 


Proof First note that, for any two distributions (w),...,u4,) and (vj,..., U,), one 
has the inequality 


— So ujlogu; <- oui log yj, (14.1.2) 
i i 

equality being possible here only if v; = u;, i= 1,...,n. This follows from the 

concavity of the function log x, since it implies that, for any a; > 0, 


> uj log aj < toe ui) 4 
i i 
equality being possible only if aj = az =--- =a,. Putting a; = v;/u;, we obtain 


relation (14.1.2). 
Next we have 


H(G1) + H(G2) =— | qij (log pi + log.q;) = — ) | 4ij log pigj. 
i,j iG 
and because {p;q;} is obviously a distribution, by virtue of (14.1.2) 
_ > gij log piqj 2 — aij log qij = H(G) 
holds, and equality is possible here only if qi; = piq;. 


5. As we saw when considering property 3, the information obtained as a result of 
the experiment G1 consisting of n independent repetitions of the experiment G1 
is equal to 


N 
Ign = -\ov; log pj, 
j=l 


452 14 Information and Entropy 


where v; is the number of occurrences of the outcome E;. By the law of large 


D 
numbers, v;/n = pj asin — o, and hence 
1 Dp 
—Jgn > H(G\)= Hp. 
n 1 


To conclude this section, we note that the measure of the amount of information 
resulting from an experiment we considered here can be derived as the only possible 
one (up to a constant multiplier) if one starts with a few simple requirements that 
are natural to impose on such a quantity.! 

It is also interesting to note the connections between the above-introduced no- 
tions and large deviation probabilities. As one can see from Theorems 5.1.2 and 
5.2.4, the difference between the “biased” entropy — )~ Pi In p; and the entropy 


-> p; np; (p; = v;/n are the relative frequencies of the outcomes Ej) is an 
analogue of the deviation function (see Sect. 8.8) in the multi-dimensional case. 


14.2 The Entropy of a Finite Markov Chain. A Theorem on the 
Asymptotic Behaviour of the Information Contained in a 
Long Message; Its Applications 


14.2.1 The Entropy of a Sequence of Trials Forming a Stationary 
Markov Chain 


Let {X Kee , be a stationary finite Markov chain with one class of essential states 
without subclasses, E),..., Ey being its states. Stationarity of the chain means that 
P(X, = j) = 7; coincide with the stationary probabilities. It is clear that 


PiG=)=>) mpg = TA P(X3=j)=7;, andsoon. 
k 


Let Gx be an experiment determining the value of X; (i.e. the state the system 
entered on the k-th step). If X,_1 =i, then the entropy of the k-th step equals 


H(GxlXx-1 =i) = — )— pij log pij- 
J 
By definition, the entropy of a stationary Markov chain is equal to 
H = EH (Gy|X¢-1) = H(GelGe-1) = — 9 mi > pij log pij. 
i j 


Consider the first n steps X1,..., X, of the Markov chain. By the Markov prop- 
erty, the entropy of this composite experiment G“) = G, x --. x Gy is equal to 


'See, e.g., [11]. 


14.2 Entropy of a Finite Markov Chain 453 


H(G") = H(G\) + H(G2|G\) +++ + H(GnlGn-1) 
= -)on; logaj + (n—-1)H ~nH 


as n — oo. If Xx were independent then, as we saw, we would have exact equality 
here. 


14.2.2 The Law of Large Numbers for the Amount of Information 
Contained in a Message 


Now consider a finite sequence (X\,..., X,) aS a message (event) C,, and denote, 
as before, by 1(C,) = —logP(C,,) the amount of information contained in Cy. 
The value of /(C,,) is a function on the space of elementary outcomes equal to 
the information Jgq@ contained in the experiment G™. We now show that, with 
probability close to 1, this information behaves asymptotically as nH, as was the 
case for independent X;. Therefore H is essentially the average information per 
trial in the sequence {X;}7° |. 


Theorem 14.2.1 Asn — ov, 
H(Cn) _ —logP(Cn) as, 4, 


n n 


This means that, for any 6 > 0, the set of all messages C,, can be decomposed into 
two classes. For the first class, |[(C,)/n — H| < 6, and the sum of the probabilities 
of the elements of the second class tends to 0 as n + oo. 


Proof Construct from the given Markov chain a new one {Y;}?° , by setting Yj; := 
(Xx, Xx41). The states of the new chain are pairs of states (E;, E;) of the chain 
{X,} with p;; > 0. The transition probabilities are obviously given by 

0, = j#k, 

Pk, j=k. 

Note that one can easily prove by induction that 


PUDED = 


PO, jk.)(™) = pier — 1) pet. (14.2.1) 


From the definition of {Y;} it follows that the ergodic theorem holds for this chain. 
This can also be seen directly from (14.2.1), the stationary probabilities being 


(fim Pl./k.D) = Tk Pk 


Now we will need the law of large numbers for the number of visits mx.) (1) 
of the chain {Yiheo , to state (k,/) over time n. By virtue of this law (see Theo- 
rem 13.4.4), 

Mk) as, 


—> TkPkl asn— oo. 
n 


454 14 Information and Entropy 


Consider the random variable P(C,,): 
P(C,) = P( Ey, Ex, --: Ex,) = P(Exy,)P(Ex,|Ex,)-:-P(Ex, |Ex,_)) 


m (n—1) 
= TX, PX, X.°** PXy-1Xn = TX, I] Bae . 
(k,l) 
The product here is taken over all pairs (k,/). Therefore (77; = P(X; =i)) 


log P(C,) = logax, + )) mena — 1) log pu, 
kl 


1 
a log P(Cy,) 4 So re Pai log py: = —H. 
k,l 


14.2.3 The Asymptotic Behaviour of the Number of the Most 
Common Outcomes in a Sequence of Trials 


Theorem 14.2.1 has an important corollary. Rank all the messages (words) C, of 
length n according to the values of their probabilities in descending order. Next pick 
the most probable words one by one until the sum of their probabilities exceeds a 
prescribed level w, 0 < a < 1. Denote the number (and also the set) of the selected 
words by My (n). 


Theorem 14.2.2 For each 0 < a < 1, there exists one and the same limit 
log M, 
lim log Ma(n) —H 
n—>oo n 
Proof Let 6 > 0 be a number, which can be arbitrarily small. We will say that C,, 
falls into category K, if its probability P(C,) > 2~"“4—®), and into category K2 if 
Qn +5) < P(C,) < Qn A—s) 
Finally, C,, belongs to the third category K3 if 
P(C;,) < Qn +s) 


Since, by Theorem 14.2.1, P(C, € Ki UK3) > 0asn > o, the set My (n) contains 
only the words from K, and K2, and the last word from M,(n) (i.e. having the 
smallest probability)—we denote it by Cy,,—belongs to K2. This means that 


Ma(nj2 "849 <  Y* PCy) <a + P(Can) <a $27), 
Cn€Ma(n) 
This implies 
logMy(n) (a +27") 
< 


n n 


+H+6. 


14.2 Entropy of a Finite Markov Chain 455 


Since 6 is arbitrary, we have 


log M. 
ag ee <H. 


noo n 


On the other hand, the words from K2 belonging to M,(n) have total probability 
>a — P(K,). If Mo (n) is the number of these messages then 


MO (n)2 "4 > w — P(K)), 
and, consequently, 

Ma(n)2-"“4~) > @ — P(K1). 
Since P(K,) > 0 as n > ov, for sufficiently large n one has 


log My (n) 2 
3 Ee 


1 a 
H—65+-—log-—. 

n 2 
It follows that 


lim sup H. 


n—-oo 


log My (n) & 
ae 


The theorem is proved. 


Now one can obtain a useful interpretation of this theorem. Let N be the number 
of the chain states. Suppose for simplicity’s sake that NV = 2”. Then the number of 
different words of length n (chains C,,) will be equal to N” = 2”. Suppose, further, 
that these words are transmitted using a binary code, so that m binary symbols 
are used to code every state. Thus, with such transmission method—we will call it 
direct coding—the length of the messages will be equal to nm. (For example, one 
can use Markov chains to model the Russian language and take N = 32, m = 5.) 
The assertion of Theorem 14.2.2 means that, for large n, with probability 1 — e, 
€ > 0, only 2”” of the totality of 2” words will be transmitted. The probability 
of transmitting all the remaining words will be small if ¢ is small. From this it is 
easy to establish the existence of another more economical code requiring, with a 
large probability, a smaller number of digits to transmit a word. Indeed, one can 
enumerate the selected 2”” most likely words using, say, a binary code again, and 
then transmit only the number of the word. This clearly requires only nH digits. 
Since we always have H < log N = ™m, the length of the message will be m/H > 1 
times smaller. 

This is a special case of the so-called basic coding theorem for Markov chains: 
for large n, there exists a code for which, with a high probability, the original mes- 
sage C,, can be transmitted by a sequence of signals which is m/H times shorter 
than in the case of the direct coding. 

The above coding method is rather an oversimplified example than a recipe for 
efficiently compressing the messages. It should be noted that finding a really ef- 
ficient coding method is a rather difficult task. For example, in Morse code it is 
reasonable to encode more frequent letters by shorter sequences of dots and dashes. 


456 14 Information and Entropy 


However, the text reduction by m/H times would not be achieved. Certain compres- 
sion techniques have been used in this book as well. For example, we replaced the 
frequently encountered words “characteristic function” by “ch.f2’ We could achieve 
better results if, say, shorthand was used. The structure of a code with a high com- 
pression coefficient will certainly be very complicated. The theorems of the present 
chapter give an upper bound for the results we can achieve. 

Since H = i log N = m, for a sequence of independent equiprobable sym- 
bols, such a text is incontractible. This is why the proximity of “new” messages 
(encoded using a new alphabet) to a sequence of equiprobable symbols could serve 
as a criterion for constructing new codes. 

It should be taken into account, however, that the text “redundancy” we are 
“fighting” with is in many cases a useful and helpful phenomenon. Without such 
redundancy, it would be impossible to detect misprints or reconstruct omissions as 
easily as we, say, restore the letter “tr” in the word “info - mation”. 

The reader might know how difficult it is to read a highly abridged and formalised 
mathematical text. While working with an ideal code no errors would be admissible 
(even if we could find any), since it is impossible to reconstruct an omitted or dis- 
torted symbol in a sequence of equiprobable digits. In this connection, there arises 
one of the basic problems of information theory: to find a code with the smallest 
“redundancy” which still allows one to eliminate the transmission noise. 


Chapter 15 
Martingales 


Abstract The definitions, simplest properties and first examples of martingales and 
sub/super-martingales are given in Sect. 15.1. Stopping (Markov) times are intro- 
duced in Sect. 15.2, which also contains Doob’s theorem on random change of time 
and Wald’s identity together with a number of its applications to boundary crossing 
problems and elsewhere. This is followed by Sect. 15.3 presenting fundamental mar- 
tingale inequalities, including Doob’s inequality with a number of its consequences, 
and an inequality for the number of strip crossings. Section 15.4 begins with Doob’s 
martingale convergence theorem and also presents Lévy’s theorem and an applica- 
tion to branching processes. Section 15.5 derives several important inequalities for 
the moments of stochastic sequences. 


15.1 Definitions, Simplest Properties, and Examples 


In Chap. 13 we considered sequences of dependent random variables Xo, X1,... 
forming Markov chains. Dependence was described there in terms of transition 
probabilities determining the distribution of X,+1 given X,. That enabled us to 
investigate rather completely the properties of Markov chains. 

In this chapter we consider another type of sequence of dependent random vari- 
ables. Now dependence will be characterised only by the mean value of X,+1 given 
the whole “history” Xo, ..., Xn. It turns out that one can also obtain rather general 
results for such sequences. 

Let a probability space (2, §, P) be given together with a sequence of random 
variables Xo, X;,... defined on it and an increasing family (or flow) of o-algebras 
{Sajn>o0: Fo SF C++- CR C+ CF. 


Definition 15.1.1 A sequence of pairs {X,, 5,3; 1 = 0} is called a stochastic se- 
quence if, for each n > 0, Xp is §,-measurable. A stochastic sequence is said to 
be a martingale (one also says that {X,,} is a martingale with respect to the flow of 
o-algebras {§,}) if, for every n > 0, 


(1) 
E|X;,| <00, (15.1.1) 


A.A. Borovkov, Probability Theory, Universitext, 457 
DOI 10.1007/978-1-4471-5201-9_15, © Springer-Verlag London 2013 


458 15  Martingales 


(2) X, is measurable with respect to ¥;, 


(3) 


E(Xn41 | 82) = Xn- (15.1.2) 


A stochastic sequence {Xy, $n; n = 0} is called a submartingale (supermartin- 

gale) if conditions (1)—(3) hold with the sign “=” replaced in (15.1.2) with “>” 
“<” respectively). 

We will say that a sequence {X,} forms a martingale (submartingale, super- 
martingale) if, for §, = 0(Xo,..., Xn), the pairs {Xy, §,} form a sequence with 
the same name. Submartingales and supermartingales are often called semimartin- 
gales. 

It is evident that relation (15.1.2) persists if we replace X;,+1 on its left-hand side 
with X,, for any m > n. Indeed, by virtue of the properties of conditional expecta- 
tions, 


E(Xm|8n) = E[E(Xn|Sm—1)|8n] = E(Xm-1|8n) = +++ = Xn. 


A similar assertion holds for semimartingales. 
If {X,} is a martingale, then E(X,41|o (Xo, ..-, Xn)) = Xn, and, by a property 
of conditional expectations, 


E(Xn+1 |o(Xn)) = E[E(Xn41 |o(Xo, ttt Xn)) |o(Xn)| = E(Xn |o(Xn)) = Xn. 
So, for martingales, as for Markov chains, we have 
E(Xn41|o (Xo, -.-,Xn)) =E(Xn41|o(Xn)). 


The similarity, however, is limited to this relation, because for a martingale, the 
equality does not hold for distributions, but the additional condition 


E(Xn+1 |o(Xn)) = Xp 
is imposed. 


Example 15.1.1 Let &,, n > 0 be independent. Then X, = | +--- + &, form a 
martingale (submartingale, supermartingale) if Eé, = 0 (Ké, > 0, Eé, < 0). It is 


obvious that X, also form a Markov chain. The same is true of X, = Tio & if 
Eg, =1. 


Example 15.1.2 Let &;,n > 0, be independent. Then 
n 
Xn =) 18, n=1, Xo = 0, 


k=1 


form a martingale if Eé,, = 0, because 


E(Xn41|o (Xo, ---,Xn)) = Xn + E(En€n41|o (En) = Xn- 


15.1 Definitions, Simplest Properties, and Examples 459 


Clearly, {X,,} is not a Markov chain here. An example of a sequence which is a 
Markov chain but not a martingale can be obtained, say, if we consider a random 
walk on a segment with reflection at the endpoints (see Example 13.1.1). 

As well as {0,1,...} we will use other sets of indices for X,, for example, 
{—oo <n < oo} or {n < —1}, and also sets of integers including infinite values 
stoo, say, {0 <n < co}. We will denote these sets by a common symbol N and 
write martingales (semimartingales) as {Xy, %n; 1 € N}. By Foo we will under- 
stand the o-algebra en ®n, and by Foo the o-algebra o (Onen $n) generated by 
Unen Bn, 80 that Foo © Fn C Foo C F for anyn en. 


Definition 15.1.2 A stochastic sequence {Xn, ¥n; n € N} is called a martingale 
(submartingale, supermartingale), if the conditions of Definition 15.1.1 hold for 
anyneN. 


If {Xn, 5; n € N} is a martingale and the left boundary no of N is finite (for ex- 
ample, N = {0, 1, ...}), then the martingale {X,,, §,,} can be always extended “to the 
whole axis” by setting §, := Fn and X, := Xn, forn < ng. The same holds for the 
right boundary as well. Therefore if a martingale (semimartingale) {X;,, 5); n € N} 
is given, then without loss of generality we can always assume that one is actually 
given a martingale (semimartingale) {Xy, §n; —oo <n < ov}. 


Example 15.1.3 Let {§,, —co <n < co} be a given sequence of increasing 
o-algebras, and € a random variable on (2,%,P), Ej&| < oo. Then {Xpy, §y; 
—0o0 <n < ow} with X, = E(é|¥%,,) forms a martingale. 

Indeed, by the property of conditional expectations, for any m < co andm > n, 


E(Xn Sn) = E[E(E|Sm)| Fn] = E(|Sn) = Xp. 
Definition 15.1.3 The martingale of Example 15.1.3 is called a martingale gener- 
ated by the random variable & (and the family {§n}). 
Definition 15.1.4 A set Nx is called the right closure of N if: 


(1) Ni =N when the maximal element of N is finite; 
(2) Nz =NU {00} if N is not bounded from the right. 


If N = N then we say that N is right closed. A martingale (semimartingale) 
{Xn, 8; n € N} is said to be right closed if N is right closed. 


Lemma 15.1.1 A martingale {Xn, §; n € N} is generated by a random variable if 
and only if it is right closed. 


The Proof of the lemma is trivial. In one direction it follows from Example 15.1.3, 
and in the other from the equality 


E(Xwl8n) = Xn, N = sup{k; keEN}, 


which implies that {X,,, 5} is generated by Xj. The lemma is proved. 


460 15 Martingales 


Now we consider an interesting and more concrete example of a martingale gen- 
erated by a random variable. 


Example 15.1.4 Let &, &,... be independent and identically distributed and as- 
sume E|&,| < oo. Set 


Sn =& +---+én, Xn = Sy/n, B—n =F (Sn, Sng, ---) =O (Sn, Enq, ---)- 
Then $n C ¥—n+1 and, for any 1 <k <n, by symmetry 
E(&«|S—n) = E(E1|8—n). 


From this it follows that 


n Sn 
Sn = E(Sn|8—n) = )) EE |B—n) = nE(E1|5—n), oe |S—n). 


k=1 


This means that {X,, %,; 1 < 1} forms a martingale generated by é,. 


We will now obtain a series of auxiliary assertions giving the simplest properties 
of martingales and semimartingales. When considering semimartingales, we will 
confine ourselves to submartingales only, since the corresponding properties of su- 
permartingales will follow immediately if one considers the sequence Y, = —Xn, 
where {X;,} is a submartingale. 


Lemma 15.1.2 


(1) The property that {Xn,n; n € N} is a martingale is equivalent to invariability 
in m =n of the set functions (integrals) 


E(Xm; A) =E(Xn; A) (15.1.3) 


for any A € §n. In particular, EX» = const. 
(2) The property that {Xy, &y; n € N} is a submartingale is equivalent to the mono- 
tone increase in m > n of the set functions 


for every A € §n. In particular, EX, ¢. 


The Proof follows immediately from the definitions. If (15.1.3) holds then, by the 
definition of conditional expectation, X;, = E(Xm|§,), and vice versa. Now let 
(15.1.4) hold. Put Y,, = E(X,,|%,). Then (15.1.4) implies that E(Y,,; A) > E(X;; A) 
and E(Y, — X,; A) > 0 for any A € §,. From this it follows that Y, = E(Xm|¥n) = 
X,, with probability 1. The converse assertion can be obtained as easily as the direct 
one. The lemma is proved. 


Lemma 15.1.3 Let {Xn, 5n; n € N} be a martingale, g(x) be a convex function, 
and E| g(Xy)| < co. Then {g(Xn), Bn; 2 € N} is a submartingale. 

If, in addition, g(x) is nondecreasing, then the assertion of the theorem remains 
true when {Xn, $n; n € N} is a submartingale. 


15.1 Definitions, Simplest Properties, and Examples 461 


The Proof of both assertions follows immediately from Jensen’s inequality 


E(g(Xn+1)|Sn) 2 g(E(Xn41 Ky) Z g(E(Xnl8n))- 


Clearly, the function g(x) = |x|? for p > 1 satisfies the conditions of the first 
part of the lemma, and the function g(x) = e** for 2 > 0 meets the conditions of 
the second part of the lemma. 


Lemma 15.1.4 Let {Xn, §n; 1 € N} be a right closed submartingale. Then, for 
Xy(a) = max{Xy, a} and any a, {Xn(a), §n; n € N} is a uniformly integrable sub- 
martingale. 

Tf {Xn, ns n EN} is a right closed martingale, then it is uniformly integrable. 


Proof Let N := sup{k: k € N}. Then, by Lemma 15.1.3, {Xn(q@), $n; n € N} is 
a submartingale. Hence, for any c > 0, 
cP(Xj(a) > c) SE(Xn(a); Xn(a) > c) < E(Xw (a); Xn(a) > c) < EXy(a) 
(here X*+ = max(0, X)) and so 
1 
P(X,,(a) > c) < -E(Xy(a)) > 0, 
c 
uniformly in 1 as c > oo. Therefore we get the required uniform integrability: 
sup E(X, (a); X,(a) > c) < sup E(X y (a); Xn(a) > c) —> 0, 
n n 
since sup, P(X;,(a) > c) > 0 as c > oo (see Lemma A3.2.3 in Appendix 3; by 
truncating at the level a we avoided estimating the “negative tails”). 
If {Xn, Fn; 1 € N} is a martingale, then its uniform integrability will follow from 


the first assertion of the lemma applied to the submartingale {|X;|, %n; 1 € Ny}. 
The lemma is proved. 


The nature of martingales can be clarified to some extent by the following exam- 
ple. 


Example 15.1.5 Let &|,&2,... be an arbitrary sequence of random variables, 
El&| < 00, F, =o (&,...,&) forn = 1, Fo = (G, &) (the trivial o-algebra), 


n n 
Sa) Zn = Y_EER|Se-1), Xn = Sn — Zn. 
k=1 


k=] 


Then {Xy, Sn; 1 = 1} is a martingale. This is a consequence of the fact that 


E(Sp41 a Zn+i|8n) = E(X;, cg Ent a EEn+1 \Fn)|Fn) = Xp. 


In other words, for an arbitrary sequence {&,}, the sequence S, can be “com- 
pensated” by a so-called “predictable” (in the sense that its value is determined by 
Si,---, Sn—1) sequence Z, so that S, — Z, will be a martingale. 


462 15 Martingales 


15.2 The Martingale Property and Random Change of Time. 
Wald’s Identity 


Throughout this section we assume that N = {n > 0}. Recall the definition of a stop- 
ping time. 


Definition 15.2.1 A random variable v will be called a stopping time or a Markov 
time (with respect to an increasing family of o-algebras {%,; n > 0}) if, for any 
n=O, {vSn}e Sn. 


It is obvious that a constant v = m is a stopping time. If v is a stopping time, 
then, for any fixed m, v(m) = min(v, m), is also a stopping time, since for n > m 
we have 


vm) <m<n, {v(m)<n}=Q en, 
and if n < m then 
{v(m) <n} ={v <n} EF. 
If v is a stopping time, then 
{ven}={v<sn}—{vsn-lhe&, {ven}=Q-{v<n-VeSn-i. 


Conversely, if {v =n} € §y, then {v <n} € §, and therefore v is a stopping time. 
Let a martingale {Xy, %n; n > 0} be given. A typical example of a stopping time 
is the time v at which X,, first hits a given measurable set B: 


v=inf{n >0: X, € B} 
(v = oo if all X, ¢ B). Indeed, 
{v=n}={Xo¢B,..., Xn-1 E B, Xn € BY E Fn. 


If v is a proper stopping time (P(v < oo) = 1), then X, is a random variable, 
since 


loo) 
Xyp= > Xnlv=n}. 
n=0 


By §, we will denote the o-algebra of sets A € § such that AN {v =n} € Fn, 
n=0,1,... This o-algebra can be thought of as being generated by the events 
{vy <n} NB,,n=0,1,..., where By € §,. Clearly, v and X, are §,-measurable. 
If vj and v2 are two stopping times, then {v2 > v1} € Fy, and {v2 > vj} € F,, since 
{vo > vy} =U, [v2 =n} 9 {v1 < n}). 

We already know that if {X;, %n} is a martingale then EX, is constant for all n. 
Will this property remain valid for EX, if v is a stopping time? From Wald’s identity 
we know that this is the case for the martingale from Example 15.1.1. In the general 
case one has the following. 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 463 


Theorem 15.2.1 (Doob) Let {Xy, 5;; n = 0} be a martingale (submartingale) and 
V1, V2 be stopping times such that 


E|X,,|<0oo, i=1,2, (15.2.1) 
lim inf E(|X;,|; v2 > n) =0. (15.2.2) 
n—->oo 
Then, on the set {v2 > vy}, 
E(X1,|8y,) = Xv, (= Xy,). (15.2.3) 


This theorem extends the martingale (submartingale) property to random time. 
Corollary 15.2.1 [f v2 = v > 0 is an arbitrary stopping time, then putting v1 =n 
(also a stopping time) we have that, on the set v > n, 

E(Xy|8n) = Xn, EX, =EXo, 
or, which is the same, for any A € $n O{v =n}, 


E(X); A) = E(Xn; A). 
For submartingales substitute “=” by “>”. 


Proof of Theorem 15.2.1 To prove (15.2.3) it suffices to show that, for any A € §,,, 
E(X\,; AN {v2 > v1}) = E(X,,; AN {v2 > vy}). (15.2.4) 


Since the random variables v; are discrete, we just have to establish (15.2.4) for sets 
An =AN {vy =n} € Fn, n=0,1,..., 1.e. to establish the equality 


E(Xvy; An M {v2 = n}) =E(Xn; An M {v2 = n}). (15.2.5) 
Thus the proof is reduced to the case vj =n. We have 
E(Xn; An {v2 = n}) = E(Xn; An N{v2 = n}) +E(Xp; An N {v2 >=n+ 1}) 
= E(Xy; An {v2 = n}) + E(Xn41; An {v2 >n+ 1}). 


Here we used the fact that {v2 > n 1} € §, and the martingale property (15.1.3). 
Applying this equality m — n times we obtain that 


E(X,3 An 1 {n < v2 < m}) 
= E(Xn; An {v2 = n}) —E(Xm3 An {v2 = m}). (15.2.6) 


By (15.2.2) the last expression converges to zero for some sequence m — oo. 
Since 


Anm := An {n < v2 < m} t Bn = An) {7 XS v3}, 
by the property of integrals and by virtue of (15.2.6), 
E(Xy,; AnO{ns v2}) = jam EX: An,m) = E(Xn; An {v2 = n}). 


464 15 Martingales 


Thus we proved equality (15.2.5) and hence Theorem 15.2.1 for martingales. The 
proof for submartingales can be obtained by simply changing the equality signs in 
certain places to inequalities. The theorem is proved. 


The conditions of Theorem 15.2.1 are far from always being met, even in rather 
simple cases. Consider, for instance, a fair game (see Examples 4.2.3 and 4.4.5) 
versus an infinitely rich adversary, in which z + S, is the fortune of the first gam- 
bler after n plays (given he has not been ruined yet). Here z > 0, S, = > “4-1 &. 
P(E = £1) = 1/2, n(z) = min{k : S, = —z} is obviously a Markov (stopping) 
time, and the sequence {S,; n = 0}, So = 0, is a martingale, but S,-) = —z. Hence 
ES, (2 = —z 4 ES, = 0, and equality (15.2.5) does not hold for vj = 0, v2 = n(z), 
z>0,n > 0. In this example, this means that condition (15.2.2) is not satisfied (this 
is related to the fact that Eyn(z) = 00). 

Conditions (15.2.1) and (15.2.2) of Theorem 15.2.1 can, generally speaking, be 
rather hard to verify. Therefore the following statements are useful in applications. 

Put for brevity 


n 
En = Xp — Xn-1, &o = Xo, Yn =) lel, n=0,1,... 
k=0 


Lemma 15.2.1 The condition 
EY, <a (15.2.7) 
is sufficient for (15.2.1) and (15.2.2) (with v; = v). 


The Proof is almost evident since |X,,| < Y, and 
E(\X,|; v>n) <E(%; v>n). 


Because P(v > n) — 0 and EY, < on, it remains to use the property of integrals by 
which E(y; A,) > 0 if E|n| < co and P(A,) > 0. 


We introduce the following notation: 
ay = E(\Gq| |Sn=1), 0, = EE |8,-1), 2=0,1,2,..., 
where §_, can be taken to be the trivial o-algebra. 
Theorem 15.2.2 Let {X;,; n => 0} be a martingale (submartingale) and v be a stop- 
ping time (with respect to {§n =o (Xo,..., Xn)}). 
(1) if 
Ev <co (15.2.8) 
and, for all n = 0, on the set {v > n} € §n—1 one has 
ay, <c =const, (15.2.9) 
then 
E|X,|<co, EX,=EX9 (© EX). (15.2.10) 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 465 


(2) If, in addition, Eo,” = BE? < 00 then 


v 
EX;=E) of. (15.2.11) 


Proof By virtue of Theorem 15.2.1, Corollary 15.2.1 and Lemma 15.2.1, to prove 
(15.2.10) it suffices to verify that conditions (15.2.8) and (15.2.9) imply (15.2.7). 
Quite similarly to the proof of Theorem 4.4.1, we have 


emi=3-( Soe: »=»)) = 5 7B lécl; v =n) =n E(gxl; v > k). 


n=0 \k=0 k=0 n=k k=0 
Here {v >k} = 2 \ {(v<k—1} € §e_1. Therefore, by condition (15.2.9), 


E(lél; v = k) = E(E(l& [Se—-1); v= k) S<cPW=H. 


This means that 
foe) 
EY, < c>) PW >k)=cEv <oo. 
k=0 
Now we will prove (15.2.11). Set Z,, := x _ a oj. One can easily see that Z,, 
is a martingale, since 


E(X?7,) — x? = ane |Fn) = E(2Xnén41 + ea = re |Fn) =0 
It is also clear that E|Z,| < oo and v(n) = min(v, 7) is a stopping time. By virtue of 
Lemma 15.2.1, conditions (15.2.1) and (15.2.2) always hold for the pair {Z,}, v(m). 
Therefore, by the first part of the theorem, 
v(n) 
EZ =0, .EXig=E) a. (5.2.13) 
k=1 
It remains to verify that 
v(n) 
jim EXy) =EX;, lim BED =B De (15.2.13) 
The second equality follows from the monotone convergence theorem (v(n) ¢ v, 
op > 0). That theorem implies the former equality as well, for X Oi aes : and 


ast To verify the latter claim, note that {X?, ¥,; 1 > 0} is a martingale, and 
therefore, for any A € $n, 


E(Xoqy; A) =E(X3; AN {v <n}) +E(X2; AN {v > n}) 
es ; AN{v <n}) +E(E(X aeileale Al (eS A) 
= (x 3 AN{y <n+1}) +E( X74; AN{v=n+}) 
( 


Xi gt A). 


466 15 Martingales 


Thus (15.2.12) and (15.2.13) imply (15.2.11), and the theorem is completely 
proved. 


The main assertion of Theorem 15.2.2 for martingales (submartingales): 
EX,=EX9 (=EXo0) (15.2.14) 


was obtained as a consequence of Theorem 15.2.1. However, we could get it directly 
from some rather transparent relations which, moreover, enable one to extend it to 
improper stopping times v. 

A stopping time v is called improper if 0 < PW < co) =1—P(v=00) < 1. 
To give an example of an improper stopping time, consider independent identically 
distributed random variables &%, a = E& < 0, Xp, = ye 1 6k, and put 


v=n(x):=minfk>1:X,;>x}, x>0. 


Here v is finite only for such trajectories { X,;} that sup, X; > x. If the last inequality 
does not hold, we put v = oo. Clearly, 


P(v =o) = P(sup Xi < x) > 0. 
k 


Thus, for an arbitrary (possibly improper) stopping time, we have 


E(Xy; v < 00) = )TE(Xi; v= k) =) UTE(Xes v 2k) —E(Xes ve kt VD]. 
k=0 k=0 


(15.2.15) 


Assume now that changing the order of summation is justified here. Then, by virtue 
of the relation {v > k + 1} € ¥,, we get 
CO 
E(X); v < 00) =EXo + D)E(Xey1 — Xe vEkK +1) 


k=0 
foe) 


=EXo+ ) EI =k + DE(Xe41 — Xxx). (15.2.16) 
k=0 
Since for martingales (submartingales) the factors E(Xx+1 — Xx|8%) = 0 (© 0), we 
obtain the following. 


Theorem 15.2.3 If the change of the order of summation in (15.2.15) and (15.2.16) 
is legitimate then, for martingales (submartingales), 


E(Xy; v <00) =EXp (> EX). (15.2.17) 


Assumptions (15.2.8) and (15.2.9) of Theorem 15.2.2 are nothing else but con- 
ditions ensuring the absolute convergence of the series in (15.2.15) (see the proof of 
Theorem 15.2.2) and (15.2.16), because the sum of the absolute values of the terms 
in (15.2.16) is dominated by 

[o,@) 
So aP(v >k+1)<aEv<o, 
k=1 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 467 


where, as before, ay = E((&_| | %,—1) with & = X, — Xx_1. This justifies the change 
of the order of summation. 

There is still another way of proving (15.2.17) based on (15.2.15) specifying a 
simple condition ensuring the required justification. First note that identity (15.2.17) 
assumes that the expectation E(X,; v < 00) exists, i.e. both values E(X}; v < oo) 
are finite, where x+ = max(-x, 0). 


Theorem 15.2.4 1. Let {Xn, §n} be a martingale. Then the condition 
lim E(X,; v>n)=0 (15.2.18) 
noo 


is necessary and sufficient for the relation 


lim E(X,; v <n) =EXo. (15.2.19) 
n—> oo 
A necessary and sufficient condition for (15.2.17) is that (15.2.18) holds and at 
least one of the values E(X{; v < 00) is finite. 


2. If {Xn, En} is a supermartingale and 


liminfE(X,; v >n) > 0, (15.2.20) 


noo 


then 


lim sup E(X,,; v <n) < EX. 
n—-oo 


If, in addition, at least one of the values E(X;>; v < 00) is finite then 


E(X,; v < co) < EX. 


3. If, in conditions (15.2.18) and (15.2.20), we replace the quantity E(Xn; v > n) 
with E(Xy; v =n), the first two assertions of the theorem will remain true. 
The corresponding symmetric assertions hold for submartingales. 


Proof As we have already mentioned, for martingales, E(é&; v > k) = 0. Therefore, 
by virtue of (15.2.18) 


n 
EXo = lim ex 4. SE: v>k)—-E(X,, v>n+ | 


Here 


EG: v>h= SEM: v>k)- Ee: v>k) 


k=1 k=1 k=1 
n n-1 

= VEX v= — DEX ve k +1). 
k=1 k=1 


Hence 


468 


— 


5 Martingales 


n 
EX) = lim [EX v>k)-E(X; v>k+D] 
We 


n 
= lim SO EX: v=k)= lim E(X,; v <n). 
noo rar n—->oo 


These equalities also imply the necessity of condition (15.2.18). 
If at least one of the values E(X;-; v < 00) is finite, then by the monotone con- 
vergence theorem 


lim E(X,; v <n) = lim E(X}; v <n) — lim E(X,; v <n) 
n—->Oo n—->oco n—->oo 


=E(x}; v< oo) _ E(X;; v< oo) =E(X,; v <@). 

The third assertion of the theorem follows from the fact that the stopping time 
v(n) = min(v, n) satisfies the conditions of the first part of the theorem (or those of 
Theorems 15.2.1 and 15.2.3), and therefore, for the martingale {X,}, 


EXo = EX) =E(X); v <n) +E(Xy; v=n), 


so that (15.2.19) implies the convergence E(X;,; v => 1) — 0 and vice versa. 
The proof for semimartingales is similar. The theorem is proved. 


That assertions (15.2.17) and (15.2.19) are, generally speaking, not equivalent 
even when (15.2.18) holds (.e., limp—+oo E(Xy; v < n) = E(X,; v < 00) is not al- 
ways the case), can be illustrated by the following example. Let & be independent 
random variables with 


P(& = 3*) = P(& = —3*) = 1/2, 
v be independent of {&}, and P(@v =k) = 2-* k=1,2,.... Then Xp =0, X, = 
Xx_-1 + & for k > 1 is a martingale, 
EX, =0, PO<om)=1, E(X,; v>n)=EX,P(v >n)=0 


by independence, and condition (15.2.18) is satisfied. By virtue of (15.2.19), this 
means that limy—.o9 P(X,; v <n) = 0 (one can also verify this directly). On the 
other hand, the expectation E(X,; v < oo) = EX, is not defined, since EX = = 
EX, =o. Indeed, clearly 


3k 3 ; 3k 43 3k 4.3 
Xi12-——, (=34}c {x= +. P(x: +)» 


3* +3 = = 
EX} >= —_. EX); =) 2" EX) =} 2 23h S00: 
k=1 k=1 


By symmetry, we also have EX, = oo. 


Corollary 15.2.2 1. If {Xn, $n} is a nonnegative martingale, then condition 
(15.2.18) is necessary and sufficient for (15.2.17). 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 469 


2. If {Xn, En} is a nonnegative supermartingale and v is an arbitrary stopping 
time, then 


E(X); v <00) < EX. (15.2.21) 


Proof The assertion follows in an obvious way from Theorem 15.2.4 since one has 
E(X,; v<o)=0. 


v? 


Theorem 15.2.2 implies the already known Wald’s identity (see Theorem 4.4.3) 
supplemented with another useful statement. 


Theorem 15.2.5 (Wald’s identity) Let €,, €2,... be independent identically dis- 
tributed random variables, Syn = 6) +--+: + on, So =0, and assume Eé, = a. Let, 
further, v be a stopping time with Ev < oo. Then 


ES, =aEv. (15.2.22) 

If, moreover, o” = Var ok < 00, then 
E[S, — va]? =07Ev. (15.2.23) 
Proof It is clear that X, = S, — na forms a martingale and conditions (15.2.8) and 


(15.2.9) are met. Therefore EX,, = EX = 0, which is equivalent to (15.2.22), and 
EX? = Evo’, which is equivalent to (15.2.23). 


Example 15.2.1 Consider a generalised renewal process (see Sect. 10.6) S(t) = 
Sna), Where Sy = =i €; (in this example we follow the notation of Chap. 10 
and change the meaning of the notation S, from the above), 7(t) = min{k : T, > t}, 
i pa , tj and (t;,&;) are independent vectors distributed as (t,&), tT > 0. Set 


ag = EE, a=Et, of = Varé and o* = Vart. As we know from Wald’s identity in 
Sect. 4.4, 
t+ Ex (0) 
En(t) = —_ ES(t) = agEn(t), 

where Ex (t) = o(t) as t > oo (see Theorem 10.1.1) and, in the non-lattice case, 
Ex(t) > <4" if ? < oo (see Theorem 10.4.3). 

We now find Var7(t) and Var S(t). Omitting for brevity’s sake the argument f, 
we can write 


a Var n(t) = ae Varn = E(an — aEn)* = E(an —T, + T, — aEn)* 
= ET, - an)” + E(T, aEn)* 2E(T,, — an)(T, — aEn). 


The first summand on the right-hand side is equal to 
‘ o*t 
o“En = — + O(1) 
a 
by Theorem 15.2.3. The second summand equals, by (10.4.8) (x(t) = Tha) — ¢), 


E(t + x(t) —aEn)” =E(x(t) —Ex(t))” < Ex7() =0(0). 


470 15 Martingales 


The last summand, by the Cauchy—Bunjakovsky inequality, is also o(t). Finally, we 
get 


ot 
Var n(t) = eka o(t). 
a 
Consider now (with r = ag/a; ¢; =§} —rt;, E¢; =0) 


Var S(t) = E(S, — agEn)? = E[S, — rT) + 1(T, — aEn)]° 


" 2 n 
=») ‘| + r°E(T;, — aEn)* +an(y) s(n —aEn). 
j=l 


j=l 
The first term on the right-hand side is equal to 


t Var 


En Var 6 = + O(1) 


a 
by Theorem 15.2.3. The second term has already been estimated above. Therefore, 
as before, the sum of the last two terms is o(t). Thus 

t 
Var S(t) = —E(é —rt)* + o(t). 
a 


This corresponds to the scaling used in Theorem 10.6.2. 


Example 15.2.2 Examples 4.4.4 and 4.5.5 referring to the fair game situation with 
P(¢, = +1) = 1/2 and v = min{k : Sy = z2 or Sy = —z1} (z1 and zz being the 
capitals of the gamblers) can also illustrate the use of Theorem 15.2.5. 
Now consider the case p = P(¢ = 1) 4 1/2. The sequence X, = (q/p)™, 
n>0O,q=1- p isa martingale, since 
E(q/p)* = p(q/p) + 4(p/q) = 1. 


By Theorem 15.2.5 (the probabilities P; and P2 were defined in Example 4.4.5), 
EX, =EXo=1, Pi(q/p)” + Po(q/p)"! =1. 
From this relation and equality P; + P2 = 1 we have 


= ait 
(q/p)! — (q/p)? 
Using Wald’s identity again, we also obtain that 
ES, — Piz2— Pozi 
Ei p-q | 


P2=1- P). 


Ev = 


Note that these equalities could have been obtained by elementary methods! but this 
would require lengthy calculations. 


'See, e.g., [12]. 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 471 


In the cases when the nature of S\, is simple enough, the assertions of the type 
of Theorems 15.2.1—15.2.2 enable one to obtain (or estimate) the distribution of the 
random variable v itself. In such situations, the following assertion is rather helpful. 

Suppose that the conditions of Theorem 15.2.5 are met, but, instead of conditions 
on the moments of ¢;,, the Cramér condition (cf. Chap. 9) is assumed to be satisfied: 

W(A) := Ee < 00 
for some A # 0. 
In other words, if 
Aai= sup(A iw(A) < oo) >0, = inf(a iw(A) < oo) <0, 
then A4 — A_ > 0. Everywhere in what follows we will only consider the values 


AE B:={Wa) < oo} CP_, Ay] 


for which w’(A) < oo. For such A, the positive martingale 


et Sn 


~ wry’ 
is well-defined so that EX, = 1. 


n Xo= 1, 


Theorem 15.2.6 Let v be an arbitrary stopping time and X. € B. Then 


eh Sv 
e( v< co} <1 (15.2.24) 
wa)” 
and, for any s > 1 andr > 1 such that 1/r + 1/s =1, 
E(e**»; v < 00) < {E[W’"/"(As); v <oo]}””, (15.2.25) 
A necessary and sufficient condition for 
eri 
E| ——;v<w]=1 (15.2.26) 
(Fam*<") 
is that 
: er Sn 
Jim B( v> n) =0. (15.2.27) 


Remark 15.2.1 Relation (15.2.26) is known as the fundamental Wald identity. In the 
literature it is usually considered for a.s. finite v (when P(v < oo) = 1) being in that 
case an extension of the obvious equality Ee*" = y"(A) to the case of random v. 
Originally, identity (15.2.26) was established by A. Wald in the special case where 
v is the exit time of the sequence {S,,} from a finite interval (see Corollary 15.2.3), 
and was accompanied by rather restrictive conditions. Later, these conditions were 
removed (see e.g. [13]). Below we will obtain a more general assertion for the prob- 
lem on the first exit of the trajectory {S,} from a strip with curvilinear boundaries. 


472 15 Martingales 


Remark 15.2.2 The fundamental Wald identity shows that, although the nature of 
a stopping time could be quite general, there exists a stiff functional constraint 
(15.2.26) on the joint distribution of v and S, (the distribution of ¢ is assumed 
to be known). In the cases where one of these variables can somehow be “com- 
puted” or “eliminated” (see Examples 15.2.2—15.2.4) Wald’s identity turns into an 
explicit formula for the Laplace transform of the distribution of the other variable. 
If v and S,, prove to be independent (which rarely happens), then (15.2.26) gives the 
relationship 


Ee’ = [Evy] 
between the Laplace transforms of the distributions of v and S,. 
Proof of Theorem 15.2.6 As we have already noted, for 


X, =e yA), Bn =O(L1,---5 on); 


{Xn, §n; n = 0} is a positive martingale with Xo = | and EX, = 1. Corollary 15.2.2 
immediately implies (15.2.24). 
Inequality (15.2.25) is a consequence of Hélder’s inequality and (15.2.24): 
E(e0/9 Sv; yids 00) = B| ( e 


1/s 
v/s . 
a) WP A)jv< 20 
< [E(w (A); p< 00) ]!””, 


ASy 


The last assertion of the theorem (concerning the identity (15.2.26)) follows from 
Theorem 15.2.4. 


We now consider several important special cases. Note that w(A) is a convex 
function (w”(A) > 0), w(0) = 1, and therefore there exists a unique point Ag at 
which y(A) attains its minimum value w (Ao) < 1 (see also Sect. 9.1). 


Corollary 15.2.3 Assume that we are given a sequence g(n) such that 

g*(n) = max(0, g(n)) =o(n) asn>o. 
If Sn < g(n) holds on the set {v > n}, then (15.2.26) holds for X. € (Ag, A+] B, 
B={d: WA) < oo}. 


The random variable v = vg = inf{k > 1: Sy > g(k)} for g(k) = o(k) obviously 
satisfies the conditions of Corollary 15.2.3. For stopping times vg one could also 
consider the case g(n)/n — c > 0 as n > ov, which can be reduced to the case 
g(n) = o(n) by introducing the random variables 


k 
Gi=a-c Sa Dig 
j=l 


for which vg = inf{k > 1: Se > g(k) — ck}. 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 473 


Proof of Corollary 15.2.3 For > 9, A € B, we have 
e( v> n) <w"(A)E(e**; S, < g(n)) 
wna) 7 - 
=P" AE(eO45 70H: Sy < g(n)) 
SW (Aye 8ME(e%05n: S, < g(n)) 
< yh (Act AOE EprOSn — (SEP) eorvwatn 0 


as n — oo, because (A — Ag)gt (n) = o(n). It remains to use Theorem 15.2.6. The 
corollary is proved. 


We now return to Theorem 15.2.6 for arbitrary stopping times. It turns out that, 
based on the Cramér transform introduced in Sect. 9.1, one can complement its 
assertions without using any martingale techniques. 

Together with the original distribution P of the sequence {¢;}?° , we introduce the 
family of distributions P, of this sequence in (R®, 8°) (see Sect. 5.5) generated 
by the finite-dimensional distributions 


ek 
Pa(Se € dxK) = va € dxx), 
Pa(ti €dx1,...,n € Xn) = | | Pale € dxx). 
k=1 


This is the Cramér transform of the distribution P. 
Theorem 15.2.7 Let v be an arbitrary stopping time. Then, for any i € B, 


eh Sv 
E| ——_; =P : 15.2.28 
(<< » <00) (Vv < 00) ( ) 


Proof Since {v =n} €o(¢1,..., fn), there exists a Borel set D,, C R”, such that 
{v=n}= {(f1,---5 Sn) € Dn}. 


Further, 
ASy oo ASn 
B( v< co} — ye v =n), 
wr) Vw") 
where 
ek Sn er Git +n) 
E| ——;v=n =) P(e; € dx 1,...,0, € dx) 
& (A) ) Otentn)eDn VA) " 
= Pu(C1 ditt... En € dtp) =P,(VEN). 
(X1,-.,Xn)E€Dn 


This proves the theorem. 


474 15 Martingales 


For a given function g(), consider now the stopping time 
v= ve = inf{k : Sx > g(k)} 


(cf. Corollary 15.2.3). The assertion of Theorem 15.2.7 can be obtained in that case 
in the following way. Denote by E, the expectation with respect to the distribu- 
tion P,. 


Corollary 15.2.4 1. Jf gt(n) = max(0, g(n)) = o(n) as n > co and rE 
(Ag, A+] 9 B, then one has Pj (vg < 00) = 1 in relation (15.2.28). 

2. If g(n) = O and d < do, then Py (vg < 0) < 1. 

3. For h = Xo, the distribution P, of the variable v can either be proper (when 
one has P))(vg < 00) = 1) or improper (Py) (vg < 00) < 1). If Ao € A_, Ad), 
gin) < d- e)o (2loglogn)!/? for all n = no, starting from some no, and o~ = 
E,,¢7, then Py (vg < 00) = 1. 

But if A € (A_,A4), g(n) = 0, and g(n) > (1+ €)a (2loglogn)!/? for n > no, 
then P, (vg < 00) < | (we exclude the trivial case ¢% = 0). 


Proof Since E,g% = ae the expectation E,¢, is of the same sign as the differ- 
ence A — Ag, and Ey, ¢% = 0 (w’ (Ao) = 0 if Ag € (A_, A4)). Hence the first assertion 


follows from the relations 
Pi (v = 00) =P, (Xp < g(n) for alln) < P(X, < g*(n)) > 0 


as n —> oo by the law of large numbers for the sums X;, = ae fx, since Ey. f; > 0. 
The second assertion is a consequence of the strong law of large numbers since 
E,¢ <0 and hence P,(v = 00) = P(sup,, X, < 0) > 0. 
The last assertion of the corollary follows from the law of the iterated logarithm 
which we prove in Sect. 20.2. The corollary is proved. 


The condition g(n) > 0 of part 2 of the corollary can clearly be weakened to the 
condition g(n) = o(n), P(v > n) > O for any n > 0. The same is true for part 3. 

An assertion similar to Corollary 15.2.4 is also true for the (stopping) time v,_ |g, 
of the first passage of one of the two boundaries g+(n) = o(n): 


Ve_.e, = inf{k > 1: Sp > g4(k) or Sy < g_(k)}. 


Corollary 15.2.5 For 4. € B\{Ao}, we have Py (ve_g, < 00) =I. 
Tf X =o € (A_, Ax), then the P)-distribution of v may be either proper or im- 


proper. 
Tf, for some no > 2, 


g+(n) S +(1 i E)OV 2\InInn 


forn = ng then Py, (vg_.g, < OO) = 1. 
If gi (n) 2 0 and, additionally, 


g+(n) 2 +(1 + E)OV 2InInn 


for n = no then Pio (Vg_.g, < ©) <1. 


15.2 The Martingale Property and Random Change of Time. Wald’s Identity 475 


Proof The first assertion follows from Corollary 15.2.4 applied to the sequences 
{X,}. The second is a consequence of the law of the iterated logarithm from 
Sect. 20.2. 


We now consider several relations following from Corollaries 15.2.3, 15.2.4 
and 15.2.5 (from identity (15.2.26)) for the random variables v = vg and v = V¢g__¢,. 

Let a < O and w(A,) > 1. Since w’(0) = a < 0 and the function (A) is convex, 
the equation 7 (A) = 1 will have a unique root jz > 0 in the domain A > 0. Setting 
A = pt in (15.2.26) we obtain the following. 


Corollary 15.2.6 [fa <0 and w(A+) = 1 then, for the stopping times v = Vg and 
V=Vzg_ig,, We have the equality 


E(e#*; v< oo) =1. 


Remark 15.2.3 For an x > 0, put (as in Chap. 10) n(x) := inf{k : S$; > O}. Since 
Snax) =X + x(x), where x (x) := Sy(~) — x 1s the value of overshoot over the level x, 
Corollary 15.2.6 implies 


(e420). n(x) < 00) = 1. (15.2.29) 


Note that P(7(x) < oo) = P(S > x), where S = sup,y.9 Sx. Therefore, Theo- 
rem 12.7.4 and (15.2.29) imply that, as x — 00, 


e*P(n(x) < 00) = [E(e!* | n(x) < 00)]! +c. (15.2.30) 


The last convergence relation corresponds to the fact that the limiting condi- 
tional distribution (as x + co) G of x(x) exists given n(x) < oo. If we denote 
by x a random variable with the distribution G then (15.2.30) will mean that 
c =[Ee“*]~! <1. This provides an interpretation of the constant c that is different 
from the one in Theorem 12.7.4. 


In Corollary 15.2.6 we “eliminated” the “component” yr’ (A) in identity (15.2.26). 
“Elimination” of the other component e*5 is possible only in some special cases of 
random walks, such as the so-called skip-free walks (see Sect. 12.8) or walks with 
exponentially (or geometrically) distributed a = max(0, fx) or ¢, = —min(0, ¢x). 
We will illustrate this with two examples. 


Example 15.2.3 We return to the ruin problem discussed in Example 15.2.2. In that 
case, Corollary 15.2.4 gives, for g_(n) := —z, and g4(n) = 22, that 


OR (Wy (A); Sy =z2) +e ME(W(A)"; Sy =z) = 1. 
In particular, for z} = z2 = z and p = 1/2, we have by symmetry that 
1 2 


yee = EWO)=a5e 52231) 


E(w)’; Sy =z) = = ht 4 go 


476 15 Martingales 


Let A(s) be the unique positive solution of the equation sy (A) = 1, s € (0, 1). Since 
here w(A) = 5(e* +e“), solving the quadratic equation yields 


os) — Lspaih st 
S 
Identity (15.2.31) now gives 


Es” = 2a” A gi). 


We obtain an explicit form of the generating function of the random variable v, 
which enables us to find the probabilities P(v =n), n = 1,2,... by expanding ele- 
mentary functions into series. 


Example 15.2.4 Simple explicit formulas can also be obtained from Wald’s identity 
in the problem with one boundary, where v = vg, g(n) = z. In that case, the class of 
distributions of ¢, could be wider than in Example 15.2.3. Suppose that one of the 
two following conditions holds (cf. Sect. 12.8). 


1. The transform walk is arithmetic and skip-free, i.e. €% are integers, P(& = 1) > 0 
and P(¢, > 2) =0. 
2. The walk is right exponential, i.e. 


Pisin =ce™ (15.2.32) 
either for all t > 0 or fort =0,1,2,... if the walk is integer-valued (the geo- 


metric distribution). 


The random variable vg will be proper if and only if E& = w’(0) > 0 (see 
Chaps. 9 and 12). For skip-free random walks, Wald’s identity (15.2.26) yields 
(g(n) =z>0, S, =z) 


e“Ew VAy=1, A>Ao. (15.2.33) 


For s < 1, the equation w(A) = sl (cf. Example 15.2.3) has in the domain A > Ag 
a unique solution A(s). Therefore identity (15.2.33) can be written as 


Es” = e728), (15.2.34) 


This statement implies a series of results from Chaps. 9 and 12. Many properties 
of the distribution of v := v, can be derived from this identity, in particular, the 
asymptotics of P(v, =n) as z > 00, n > o. We already know one of the ways to 
find this asymptotics. It consists of using Theorem 12.8.4, which implies 


P(v, =n) = —P(S; =2), (15.2.35) 


and the local Theorem 9.3.4 providing the asymptotics of P(S, = z). Using rela- 
tion (15.2.34) and the inversion formula is an alternative approach to studying the 
asymptotics of P(v, =n). If we use the inversion formula, there will arise an integral 
of the form 


i set ds, (15.2.36) 
s\|= 


15.3 Inequalities 477 


where the integrand s~"e~<#), after the change of variable (s)=A (or s = 
wa), takes the form 


exp—{za —niny(a)}. 


The integrand in the inversion formula for the probability P(S,, = z) has the same 
form. This probability has already been studied quite well (see Theorem 9.3.4); its 
exponential part has the form e~”4, where a = z/n, A(a) = sup, (aA — Iny(A)) 
is the large deviation rate function (see Sect. 9.1 and the footnote for Defini- 
tion 9.1.1). A more detailed study of the inversion formula (15.2.36) allows us to 
obtain (15.2.35). 

Similar relations can be obtained for random walks with exponential right dis- 
tribution tails. Let, for example, (15.2.32) hold for all t > 0. Then the conditional 
distribution P(S, > t|v =n, S,—1 = x) coincides with the distribution 


Pon > Z-—xX+ ton >z—-x)=e™ 
and clearly depends neither on n nor on x. This means that v and S,, are independent, 
Sy=zty,y €Pa, 


= 1 _,,a—-A _ a—A(s) 
De _ yn kz ; v za(s) 
Ey) = EG =e a do <A <a; Es" =e —_,, 


a 


where A(s) is, as before, the only solution to the equation y (A) = s— in the domain 
2. > Ao. This implies the same results as (15.2.34). 
If P(g, > t) =cye~™ and P(t, < —t) = c2e~*", t > 0, then, in the problem with 


two boundaries, we obtain for v = vg_j¢,, g4(n) = Z2 and g_(n) = —z, in exactly 
the same way from (15.2.26) that 
ae*2 —XzZ1 


E(y-" (A); Sy > 22) + : E(y-’@); Sy <—z1)=1, Ae (B,a). 


a-xzr +A 


15.3 Inequalities 
15.3.1 Inequalities for Martingales 


First of all we note that the property EX, < 1 of the sequence X, = e*5"wo(A)~” 
forming a supermartingale for an appropriate function wo(A) remains true when we 
replace n with a stopping time v (an analogue of inequality (15.2.24)) in a much 
more general case than that of Theorem 15.2.6. Namely, ¢, may be dependent. 

Let, as before, {%,} be an increasing sequence of o-algebras, and ¢, be 
§n-measurable random variables. Suppose that a.s. 


E(c*"|F,—1) < vo). (15.3.1) 


This condition is always met if a.s. 


Pon = x|Gn-1) S G(x), Yo(A) = -fe dG(x) < oo. 


478 15  Martingales 


In that case the sequence X, = e** Wo" (A) forms a supermartingale: 


E(Xn|Sn-1) = Xn-1, EX, = 1. 


Theorem 15.3.1 Let (15.3.1) hold and v be a stopping time. Then inequalities 
(15.2.24) and (15.2.25) will hold true with w replaced by Wo. 


The Proof of the theorem repeats almost verbatim that of Theorem 15.2.6. 


Now we will obtain inequalities for the distribution of 


Xn=maxX, and X*=max|X;x|, 
k<n k<n 


X, being an arbitrary submartingale. 


Theorem 15.3.2 (Doob) Let {Xn,%n; n = O} be a nonnegative submartingale. 
Then, for all x > 0 andn > 0, 


= 1 
P(X, > x) < —EX,. 
Xx 


Proof Let 
v= n(x) :=inf{k >0: X; > x}, v(n) := min(v, n). 


It is obvious that n and v(m) are stopping times, v(”) <n, and therefore, by Theo- 
rem 15.2.1 (see (15.2.3) for v2 =n, vy = v(n)), 


EX, > EX,n)- 


Observing that {X, >x}={X v(n) > x}, we have from Chebyshev’s inequality that 


= 1 1 
P(Xn > x) =P(Xyin) > 8) S SEX yin) S SEX. 


The theorem is proved. 


Theorem 15.3.2 implies the following. 


Theorem 15.3.3 (The second Kolmogorov inequality) Let {Xn,%n; n = 0} be a 
martingale with a finite second moment EX?. Then {x2 &n; n = O} is a submartin- 
gale and by Theorem 15.3.2 
1 2 
PX Sx) = EX; 

Originally A.N. Kolmogorov established this inequality for sums X, = & + 
---+&, of independent random variables &,. Theorem 15.3.3 extends Kolmogorov’s 
proof to the case of submartingales and refines Chebyshev’s inequality. 

The following generalisation of Theorem 15.3.3 is also valid. 


15.3. Inequalities 479 


Theorem 15.3.4 If {X;,, 5,3; 1 = 0} is a martingale and E|X;,|? < co, p > 1, then 
{|Xn|?, Fn; n = 0} forms a nonnegative submartingale and, for all x > 0, 


1 
P(X; >x)< —pElXnl?. 


Tf {Xn, En; n = O} is a submartingale, Ee**n < 00,4 > 0, then {eXn &,: n> O} 
also forms a nonnegative submartingale, 


P(X, > x) <e7 Ee**", 


Both Theorem 15.3.4 and Theorem 15.3.3 immediately follow from Lem- 
ma 15.1.3 and Theorem 15.3.2. 

If X, = S, = vel fx, where ¢% are independent, identically distributed and 
satisfy the Cramér condition: A+ = sup{A : (A) < oo} > 0, then, with the help of 
the fundamental Wald identity, one can obtain sharper inequalities for P(X, > x) in 
the case a = H& <0. 

Recall that, in the case a = w’(0) <0, the function w(A) = Ee** decreases in a 
neighbourhood of 4 = 0, and, provided that y(A_) > 1, the equation y(A) = | has 
a unique solution jz in the domain A > 0. 

Let ¢ be a random variable having the same distribution as ¢;. Put 


w+ = supE(e4“S—|¢ > 2), y= infE(e“S- |e > 2). 
t>0 t>0 


If, for instance, P(¢ > t) = ce~™ for t > O (in this case necessarily a > pw in 
(15.2.32)), then 
P t 
EG EY) ga ve =p_= me a 
P(C >t) a — Mh 
A similar equality holds for integer-valued € with a geometric distribution. 
For other distributions, one has py, > w_. 
Under the above conditions, one has the following assertion which supplements 
Theorem 12.7.4 for the distribution of the random variable S' = sup; Sx. 


P(E -—t>vC>H= 


Theorem 15.3.5 [fa =E¢ <0 then 
wile <P(S>x)<yoie™, x>0. (15.3.2) 


This theorem implies that, in the case of exponential right tails of the distribution 
of ¢ (see (15.2.32)), inequalities (15.3.2) become the exact equality 


P(S> x)= Fe, 
a 


(The same result was obtained in Example 12.5.1.) This means that inequalities 
(15.3.2) are unimprovable. Since S, = maxy<, Sx < S, relation (15.3.2) implies 
that, for any n, 


P(S, > x) <Wile™. 


480 15 Martingales 
Proof of Theorem 15.3.5 Set v := 00 if S = supyso Se < x, and put v:= n(x) = 


min{k : Sx > x} otherwise. Further, let x(x) := Spi.) — x be the excess of the 
level x. We have 


oo x 
P(x (x) > v; v<oo)= >of P(Sp_-1 <x, Sp_-1 Edu, & >x —u+v) 
k=l" 


Mi 


Xe 
/ P(Sp-1 SX, Sp-1 Edu, G& > x —U) 


CO 


> 
Il 
an 


P(Q@,>x-—ut+v|q >x—-U), 


gx 


x 
E(e#*: y <00) < i P(Sp-1 <x, Sp-1 € du, & > x —u) wy 
—oo 


Se 
Il 
an 


CO 


II 
< 
+ 


Pv =k) = W+P( < ov). 


> 
Il 
= 


Similarly, 
E(e#x@), v < 00) > w_P(v < ov). 
Next, by Corollary 15.2.6, 
1= E(e#*; v< oo) = eR (etx), v< oo) < ews P(v < 00). 


Because P(v < co) = P(S > x), we get from this the right inequality of The- 
orem 15.3.5. The left inequality is obtained in the same way. The theorem is 
proved. 


Remark 15.3.1 We proved Theorem 15.3.5 with the help of the fundamental Wald 
identity. But there is a direct proof based on the following relations: 


n 
w" (A) = Ee > SE (eit S04; py = k) 
k=1 


n 
=F Ree ee et ger), (15.3.3) 
k=1 


Here the random variables er xXOT(p =k) and S, — Sx are independent and, as be- 
fore, 


E(e’*; v=k) > w_PW=h). 
Therefore, for all A such that w(A) < 1, 
n 
wr (ay > ey Y Pw") Pv =k) = pie W" (APL <n). 
k=1 
Hence we obtain 


P(S, > x) =P <n) <wWi'e*. 


15.3 Inequalities 481 


Since the right-hand side does not depend on n, the same inequality also holds for 
P(S > x). The lower bound is obtained in a similar way. One just has to show that, 
in the original equality (cf. (15.3.3)) 


n 
w"(a) = So E(e**; v=k) +E (ec; ven), 
k=1 


ASn + 


one has E(e*?"; v > n) = o(1) asn > o& for A = pL, which we did in Sect. 15.2. 


15.3.2 Inequalities for the Number of Crossings of a Strip 


We now return to arbitrary submartingales X, and prove an inequality that will be 
necessary for the convergence theorems of the next section. It concerns the number 
of crossings of a strip by the sequence X,,. Let a < b be given numbers. Set vo = 0, 


vy :=min{n > 0: X;, <a}, v2:=min{n > vy: X, > dD}, 


Voe—1 :=mMin{n > vor-2: Xp, <a}, vog:=min{n > v2_1 : X;, > D}. 


We put v,, := co if the path {X,,} for n > v,,-1 never crosses the corresponding 
level. Using this notation, one can define the number of upcrossings of the strip 


(interval) [a, b] by the trajectory Xo,..., X» as the random variable 
max{k: vo, <n} ifvo2<n, 
v(a, b;n):= : 
0 ifv2>n. 


Set (a)t = max(0, a). 


Theorem 15.3.6 (Doob) Let {Xy, ¥,; 1 = 0} be a submartingale. Then, for all n, 


E(X, — a)t 
Ev(a, b;n) < Fico - Tt (15.3.4) 


—a 
It is clear that inequality (15.3.4) assumes by itself that only the submartingale 
{Xn, 5n3 0 <k <n} is given. 


Proof The random variable v(a, b; n) coincides with the number of upcrossings of 
the interval [0, b — a] by the sequence (X, — a)*. Now {(X, — a)T, Sn; 1 = O} 
is a nonnegative submartingale (see Example 15.1.4) and therefore, without loss of 
generality, one can assume that a = 0 and X,, > 0, and aim to prove that 


EX, 


Let 
1 if ve < j < ves) for some odd k, 
lie 


O if vy <j < ve41 for some even k. 


482 15 Martingales 


Fig. 15.1 Illustration to the + 
proof of Theorem 15.3.6 e 
showing the locations of the 
random times v1, v2, and v3 
(here a = 0) 


In Fig. 15.1, vy = 2, vo = 5, v3 = 8; 1; =0 for j <2, 7; = 1 for3 <j <5 ete. 
It is not hard to see (using the Abel transform) that (with Xo = 0, no = 1) 


n n—1 


noXo + Yn (Xj — Xj-1) = D> Xj (nj — nj) + mXn = vO, bn). 
1 0 


Moreover (here N); denotes the set of odd numbers), 


np=U= VU me <is<ved= U [ms i-U- ar <7 - De Sj. 
keN| keN, 


Therefore, by virtue of the relation E(X ;|§j—1) — X ;-1 = 0, we obtain 


n n 
bEv(0, b; n) <E) nj (Xj — Xj-1) = )OE(Xj — Xj-1s nj =) 
1 1 
n 


= > E[E(X; — Xj-11)-1); nj = 1] = 0 EEX 15 )-1) — Xj; nj = 1] 
1 


1 


< SS E[E(X | )-1) — Xj-1|= SC E(X; — Xj-1) =EX;. 
1 1 


The theorem is proved. 


15.4 Convergence Theorems 


Theorem 15.4.1 (Doob’s martingale convergence theorem) Let 
{Xn, Fn3 —CO<n<o} 
be a submartingale. Then 


(1) The limit X_o5 := limy_s—oo Xp exists a.s., EXT 
{Xn, 5n; —CO <n < oo} is a submartingale. 

(2) If sup, EX,* < 00 then Xoo := limyn-+00 Xn exists a.s. and EXE, < oo. If, more- 
over, sup, E|Xn| < 00 then E|Xo0| < 00. 

(3) The random sequence {Xn, §n; —CO <n < 0c} forms a submartingale if and 
only if the sequence {X‘*} is uniformly integrable. 


oo < ©, and the process 


15.4 Convergence Theorems 483 


Proof (1) Since 
{lim sup X, > liminf X,} = U {limsup X, > b >a > liminf X;} 
rational 
a,b 
(here the limits are taken as n — —oo), the assumption on divergence with positive 
probability 
P(lim sup X;,, > liminf X;,) > 0 

means that there exist rational numbers a < b such that 

Pdimsup X, > b> a> liminf X,) > 0. (15.4.1) 


Let v(a, b; m) be the number of upcrossings of the interval [a, b] by the sequence 
Y, = X_m,.--, %¥m = X_, and v(a, b) = lim. v(a, b; m). Then (15.4.1) means 
that 


P(v(a, b) = 00) > 0. (15.4.2) 
By Theorem 15.3.6 (applied to the sequence Y),..., Yin), 
E(X_1—a)t _ EX™, + lal 


Eva, b; < < ; 15.4.3 
v(a, b; m) a a ( ) 
EX? +lal 
Ev(a, b) < —————_.. (15.4.4) 
b-a 
Inequality (15.4.4) contradicts (15.4.2) and hence proves that 
P(lim sup X,, = liminf X,,) = 1. 

Moreover, by the Fatou—Lebesgue theorem (X es := liminf X = ), 

EX?,, <liminf X} <EX?, <oo. (15.4.5) 


Here the second inequality follows from the fact that {X ai , on} is also a submartin- 
gale (see Lemma 15.1.3) and therefore EX; +. 

By Lemma 15.1.2, to prove that {X», %n; —oo <n < oo} is a submartingale, it 
suffices to verify that, for any A € §¥_w~ CF, 


E(X_.; A) < E(X,); A). (15.4.6) 
Set X,(a) := max(X,,, a). By Lemma 15.1.4, {X,(a), 5,3 n < 0} is a uniformly 
integrable submartingale. Therefore, for any —oo <k <n, 
E(X;(a); A) < E(Xn(a); A), 


E(X_.0(a); A) = lim E(Xx(a); A) <E(Xn(a); A). (15.4.7) 


Letting a — —oo we obtain (15.4.6) from the monotone convergence theorem. 

(2) The second assertion of the theorem is proved in the same way. One just has 
to replace the right-hand sides of (15.4.3) and (15.4.4) with EX a and sup, EX a ; 
respectively. Instead of (15.4.5) we get (the limits here are as n > co) 


EX, < liminfEX* <O, 


484 15  Martingales 


and if sup, E|X,,| < oo then 
E|Xoo| < liminfE|X,| < co. 


(3) The last assertion of the theorem is proved in exactly the same way as the 
first one—the uniform integrability enables us to deduce along with (15.4.7) that, 
for any A € ¥n, 


E(Xoo(a); A) = lim E(Xi(a); A) > E(Xn(a); A). 


The converse part of the third assertion of the theorem follows from Lemma 15.1.4. 
The theorem is proved. 


Now we will obtain some consequences of Theorem 15.4.1. 

So far (see Sect. 4.8), while studying convergence of conditional expectations, 
we dealt with expectations of the form E(X,,|%). Now we can obtain from Theo- 
rem 15.4.1 a useful theorem on convergence of conditional expectations of another 


type. 


Theorem 15.4.2 (Lévy) Let a nondecreasing family §| C §2 C-:: C ¥ of o- 
algebras and a random variable &, with E|E| < oo, be given on a probability space 
(2, §, P). Let, as before, Soo := o(U, ®n) be the o-algebra generated by events 
from $1, §2,.... Then, asn > ov, 


EE |Fn) S$ E(E|Foo)- (15.4.8) 


Proof Set Xn := E(é|5,). We already know (see Example 15.1.3) that the sequence 
{Xn,n; 1 <n < 0} is a martingale and therefore, by Theorem 15.4.1, the limit 
limn—+oo Xn = X (oo) exists a.s. It remains to prove that X (oo) = E(E|Foo) (ie., that 
X(o0) = Xoo). Since {Xn, Kn; 1 <n < co} is by Lemma 15.1.4 a uniformly inte- 
grable martingale, 


E(X(o0)} A) = lim E(Xn; A) = lim E(E(E|$n); A) =E(é; A) 


for A € §% and any k = 1,2,... This means that the left- and right-hand sides of 
the last relation, being finite measures, coincide on the algebra es ®n- By the 
theorem on extension of a measure (see Appendix 1), they will coincide for all 
A€éo Ci Sn) = Soo. Therefore, by the definition of conditional expectation, 


X (oo) = E(E|Boo) = Xoo. 


The theorem is proved. 


We could also note that the uniform integrability of {Xy, %,; 1 <n < co} implies 


that ““S in (47) can be replaced by Beg 


Theorem 15.4.1 implies the strong law of large numbers. Indeed, turn to our Ex- 
ample 15.1.4. By Theorem 15.4.1, the limit X_o5 = limy—o5 Xp = liMy+o0 nS, 
exists a.s. and is measurable with respect to the tail (trivial) o-algebra, and therefore 
it is constant with probability 1. Since EX_.o = Eé1, we have nS, a Eé,. 


15.4 Convergence Theorems 485 


One can also obtain some extensions of the theorems on series convergence of 
Chap. 11 to the case of dependent variables. Let 


n 
Xn = Sp = >> bi 
k=1 


and X, form a submartingale (E(é,+1|%n) = 0). Let, moreover, E|X;,| < c for all 
n and for some c < oo. Then the limit Soo = limy_.99 S, exists a.s. (As well as 
Theorem 15.4.1, this assertion is a generalisation of the monotone convergence the- 
orem. The crucial role is played here by the condition that E|X,,| is bounded.) In 
particular, if &, are independent, Eé, = 0, and the variances a; of & are such that 
1 Of <0” < 00, then 


 \1e 
E|Xn| < (EX?)'” < (S02) <0 <0, 


a.s. . 
and therefore S,, —> Soo. Thus we obtain, as a consequence, the Kolmogorov theo- 
rem on series convergence. 


Example 15.4.1 Consider a branching process {Zn} (see Sect. 7.7). We know that 
Z, admits a representation 
2a = Oboe Oz, 45 


where the ¢; are identically distributed integer-valued random variables independent 
of each other and of Z,—1, ¢;% being the number of descendants of the k-th particle 
from the (n — 1)-th generation. Assuming that Zo = | and setting 4 := Ef, we 
obtain 


E(Z,|Zn—1) = UZn-1, EZ, = BEZ,_—1 = pw". 
This implies that X, = Z,/j” is a martingale, because 


E(Xn|Xn-1) = ales me = Xn-1. 
For branching processes we have the following. 


Theorem 15.4.3 The sequence X, = u~"Z,, converges almost surely to a proper 
random variable X with EX < oo. The chf. p(A) of the random variable X satisfies 
the equation 


yp (wa) = p(v(a)), 


where p(v) = Ev‘. 
Theorem 15.4.3 means that ~~” Z, has a proper limiting distribution as n > oo. 


Proof Since X, => 0 and EX, = 1, the first assertion follows immediately from 
Theorem 15.4.1. 


486 15 Martingales 


Since Ez" is equal to the n-th iteration of the function f(z), for the ch.f. of Z, 
we have (7 (A) := Ee!*”) 


92, (A) = p(¢z,-1)), 


Xr 
0%, 08) = 625(0"4) = Pl 62-4") = P(ox.-.(*)). 


Because X, = X and the function p is continuous, from this we obtain the equation 
for the ch.f. of the limiting distribution X: 


r-r((2)} 


In Sect. 7.7 we established that in the case jz < | the process Z, becomes extinct 
with probability 1 and therefore P(X = 0) = 1. We verify now that, for w > 1, the 
distribution of X is nondegenerate (not concentrated at zero). It suffices to prove 
that {X,,0 <n < oo} forms a martingale and consequently 


EX =EX,, £0. 


The theorem is proved. 


By Theorem 15.4.1, it suffices to verify that the sequence X,, is uniformly integrable. 
To simplify the reasoning, we suppose that Var(¢,) = 07 < oo and show that then 
EX 2 <c < oo (this certainly implies the required uniform integrability of X,, see 
Sect. 6.1). One can directly verify the identity 
n 
Zp w= Ze - (H Zp) wr. 
k=1 

Since E[Z? — (wZ¢—1)7|Ze—1] = 0? Ze—-1 (recall that Var(n) = E(n? — (En)*)), we 
have 


n 
Var(Zn) = E(Z; a wn) = s FT la cal OF A 
k=1 


n 2,,n7,,n 
—p-1 oR" (u" — 1) 
=p"? Du k-1 
= E(u — 1) 


2 —n 2 
_ 
Ex? =p nz?a14 204 yy 7 
(wu — 1) H( — 1) 
Thus we have proved that X is a nondegenerate random variable, 


’ 


o2 


EX=1, Var(X,) ~ ————. 
Hu — 1) 

From the last relation one can easily obtain that Var(X) = - To this end one 

can, say, prove that X,, is a Cauchy sequence in mean quadratic and hence (see 


Theorem 6.1.3) Xy ah Xx. 


15.5 Boundedness of the Moments of Stochastic Sequences 487 
15.5 Boundedness of the Moments of Stochastic Sequences 


When one uses convergence theorems for martingales, conditions ensuring bound- 
edness of the moments of stochastic sequences {X,, §,} are of significant interest 
(recall that the boundedness of EX, is one of the crucial conditions for convergence 
of submartingales). The boundedness of the moments, in turn, ensures that X,, is 
stochastically bounded, i.e., that sup, P(X; > N) —- 0 as N — oo. The last bound- 
edness is also of independent interest in the cases where one is not able to prove, for 
the sequence {X,,}, convergence or any other ergodic properties. 

For simplicity’s sake, we confine ourselves to considering nonnegative sequences 
X, = 0. Of course, if we could prove convergence of the distributions of X, to a 
limiting distribution, as was the case for Markov chains or submartingales in The- 
orem 15.4.1, then we would have a more detailed description of the asymptotic 
behaviour of X, as n — oo. This convergence, however, requires that the sequence 
X,, Satisfies stronger constraints than will be used below. 

The basic and rather natural elements of the boundedness conditions to be con- 
sidered below are: the boundedness of the moments of &); = Xn — Xn—1 of the re- 
spective orders and the presence of a negative “drift” E(é,|%,—1) in the domain 
X,-1 > N for sufficiently large N. Such a property has already been utilised for 
Markov chains; see Corollary 13.7.1 (otherwise the trajectory of X, may go to 00). 

Let us begin with exponential moments. The simplest conditions ensuring the 
boundedness of sup,, Ee**" for some 4 > 0 are as follows: for all n > 1 and some 
X4>O0and N <~w, 


E(e**" 
E(e*5 


Bn—1) (Xn-1 > N) < BA <1, (15.5.1) 
$n—1) (Xn-1 = N) < WA) < ov. (15.5.2) 


Theorem 15.5.1 Jf conditions (15.5.1) and (15.5.2) hold then 


wa) eX 


ee 15.5.3 
1—B@ re 


E(e**" 


Bo) < BA *o + 


Proof Denote by Ay, the left-hand side of (15.5.3). Then, by virtue of (15.5.1) and 
(15.5.2), we obtain 
An = E{E[e**" (I(Xn—1 > N) + 1(Xn-1 < N))|Fn-1]|Fo} 
<E[e**" (BA) U(Xn-1 > N) + WA) U(Xn-1 < N))|5o] 
< BA)An-1 +e WA). 


This immediately implies that 


— NU (A) 
An sS Ao" (A) + oN (A) > pe (A) = AoB" (A) + 170 
k=0 


The theorem is proved. 


488 15 Martingales 


The conditions 


E(én|6n-1) < —€ <0 onthe w-set {Xn-1 > N}, (15.5.4) 
E(e*!n! Fn-1) < V1) < co for some 4 > 0 (15.5.5) 


are sufficient for (15.5.1) and (15.5.2). 

The first condition means that Y, := (X, + en) I(X,_1 > N) is a supermartin- 
gale. 

We now prove sufficiency of (15.5.4) and (15.5.5). That (15.5.2) holds is clear. 
Further, make use of the inequality 


; APs 
e<l+xt+ ie 


which follows from the Taylor formula for e* with the remainder in the Cauchy 
form: 


; ae 
e= It+x+ 50%, 6 €[0, 1]. 
Then, on the set {X,_; > N}, one has 


E(e*" 


Fn Jats" E(E2e Alén| 


Saat): 
Since x* < e**/? for sufficiently large x, by the Hélder inequality it follows that, 
together with (15.5.5), we will have 

E(Ere™nl/? 15,1) < W2(A) < 00. 


This implies that, for sufficiently small 4, one has on the set {X,—; > N} the in- 
equality 


2 


2 
E(e**"|3,-1) <1—Aet val) =: B(A) <1- - <1. 


This proves (15.5.1). 


Corollary 15.5.1 Jf, in addition to the conditions of Theorem 15.5.1, the distribution 
of Xn converges to a limiting distribution: P(X, < t) => P(X <1), then 


Bex < CO vO) 
tap) BO) 
The corollary follows from the Fatou—Lebesgue theorem (see also Lemma 6.1.1): 


Ee** <liminfEe*". 
n—->Oo 


We now obtain bounds for “conventional” moments. Set 


M'(n):=EX! 


n? 


m(O):=1, m():=sup sup En|8n-1), 


n>1we{Xn-1>N} 


m(1) = supsupE(|€n|'|Fn-1), 2 > 1. 
n>|1 @ 


15.5 Boundedness of the Moments of Stochastic Sequences 489 


Theorem 15.5.2 Assume that EX) < 00 for some s > 1 and there exist N => 0 and 
€ > 0 such that 


mil) = =e; (15.5.6) 
m(s) <c <0. (15.5.7) 
Then 
lim inf M°—!(n) <0. (15.5.8) 
noo 
If, moreover, 
MS (n+ 1) > M%(n)— cy (15.5.9) 
for some c, > 0, then 
sup M°—!(n) <0. (15.5.10) 


n 


Corollary 15.5.2 If conditions (15.5.6) and (15.5.7) are met and the distribution 
of Xn converges weakly to a limiting distribution: P(X; < t) > P(X < ft), then 
EX'! <0. 


This assertion follows from the Fatou-Lebesgue theorem (see also Lemma 6.1.1), 
which implies 


EX*! <liminfEX*’"!. 
n->oo 


The assertion of Corollary 15.5.2 is unimprovable. One can see this from the 


example of the sequence X, = (Xn—1 + n)*, where ES ¢ are independent and 
identically distributed. If Eg, < 0 then the limiting distribution of X,, coincides with 
the distribution of S = sup; S; (see Sect. 12.4). From factorisation identities one can 
derive that ES°—! is finite if and only if E(¢+)* < 00. An outline of the proof is as 
follows. Theorem 12.3.2 implies that ES* = cE(xh; n+ < ©), c=const < oo. It 
follows from Corollary 12.2.2 that 


[o.@) 
1 —E(e**+; n4 < 00) = (1— ei) [ e * dH(x), 
0 


where H (x) is the renewal function for the random variable — x2 > 0. Since 
ay t+ bx < H(x) < ay +box 


(see Theorem 10.1.1 and Lemma 10.1.1; a;, bj are constants), integrating the con- 
volution 


P(x4 > x, ny <o0) = | P(¢ >v+x)dH(v) 
0 


by parts we verify that, as x — ov, the left-hand side has the same order of magni- 
tude as i P(¢ > v+-~x) dv. Hence the required statement follows. 


490 15  Martingales 


We now return to Theorem 15.5.2. Note that in all of the most popular problems 
the sequence M 5—1(n) behaves “regularly”: either it is bounded or M@ s—l(n) + oo. 
Assertion (15.5.8) means that, under the conditions of Theorem 15.5.2, the sec- 
ond possibility is excluded. Condition (15.5.9) ensuring (15.5.10) is also rather 
broad. 


Proof of Theorem 15.5.2 Let for simplicity’s sake s > 1 be an integer. We have 


oo 
E(X;; Xn-1 > N) = a E((x +&n)"; Xn-1€ dx) 


=) (7) [oer Xn—1 €dx). 
1=0 


If we replace &* —! for s — 1 > 2 with |&,|°~! then the right-hand side can only in- 
crease. Therefore, 
Ss 
E(X3; Xn-1>N) < » (j)mo —1)M\,(n— 1), 
/=0 : 


where 
M\(n) =E(X!; Xn >). 


The moments M*(n) = EX> satisfy the inequalities 


MS (n) <E[(N + |éil)°s Xn-1 SN] +)- (;)mo — DM (n 1) 
1=0 


<2[N +e] + (j)me Min - 1p, (15.5.1) 


1=0 


Suppose now that (15.5.8) does not hold: Ms~!(n) > oo. Then all the more 
M‘(n) — oo and there exists a subsequence n’ such that M*(n’) > M*(n' — 1). 
Since M!(n) < [M'*!(n)]!/'+!, we obtain from (15.5.6) and (15.5.11) that 


M*(n’) <const + M*(n' — 1) +sMs"(n' —1)m(Q) + 0(M*! (x —1)) 
< M* (n' _ 1) _ 55M" (0! _ 1) 


for sufficiently large n’. This contradicts the assumption that M*(n) — oo and hence 
proves (15.5.8). 

We now prove (15.5.10). If this relation is not true then there exists a sequence 
n’ such that M°—!(n’') > 00 and M*(n’') > MS(n’ — 1) —cy. It remains to make use 
of the above argument. 

We leave the proof for a non-integer s > 1 to the reader (the changes are elemen- 
tary). The theorem is proved. 


15.5  Boundedness of the Moments of Stochastic Sequences 491 


Remark 15.5.1 (1) The assertions of Theorems 15.5.1 and 15.5.2 will remain valid 
if one requires inequalities (15.5.4) or E(&, + €|8n—1) (Xn_-1 > N) < 0 to hold not 
for all n, but only for n > no for some no > 1. 

(2) As in Theorem 15.5.1, condition (15.5.6) means that the sequence of random 
variables (X, + én) I(Xj—1 > N) forms a supermartingale. 

(3) The conditions of Theorems 15.5.1 and 15.5.2 may be weakened by replac- 
ing them with “averaged” conditions. Consider, for instance, condition (15.5.1). By 
integrating it over the set {X,_1 > x > N} we obtain 


E(e"; Xq—1 > x) < BO)P(Xn-1 > x) 
or, which is the same, 


E(e* 


Xn-1 > x) < BA). (15.5.12) 


The converse assertion that (15.5.12) for all x > N implies relation (15.5.1) is obvi- 
ously false, so that condition (15.5.12) is weaker than (15.5.1). A similar remark is 
true for condition (15.5.4). 


One has the following generalisations of Theorems 15.5.1 and 15.5.2 to the case 
of “averaged conditions”. 


Theorem 15.5.1A Let, for some >0, N >Oandallx>N, 


E(e**"|X,-1 >x) <B@)<1, — E(e"; X,-1 < N) < (A) <ov. 
Then 
AN 
WA) 
Fe** < B"(A) Ee**¥ + e : 
1— Ba) 
Put 


m(1) := sup sup E(E;|Xn—-1 > x), 


n>1x>N 


m(1) := sup sup E(|&,|'|X(n) >x), I> 1. 


n>1x>N 


Theorem 15.5.2A Let EX, < 00 and there exist N = 0 and ¢ > 0 such that 
m(1)<—e, m(s)<oo, E(|&l°; Xn-1 SN) <c<ov. 

Then (15.5.8) holds true. If, in addition, (15.5.9) is valid, then (15.5.10) is true. 

The proofs of Theorems 15.5.1A and 15.5.2A are quite similar to those of Theo- 


rems 15.5.1 and 15.5.2. The only additional element in both cases is integration by 
parts. We will illustrate this with the proof of Theorem 15.5.1A. Consider 


492 15  Martingales 


Ble") Kyi >ny=[- eE (en: X,_1 € dx) 
N 


[o.@) 
SE( eth): X44 > N) +f Ae E(e > X41 > x) dx 
N 


[o.@) 
<E(e* N45): X, 1 > N+ po») | re P(Xn_1 > x) dx 
N 


= e NE(e*" — BA); Xn-1 > N) + BO) [ e*P(X,-1 € dx) 
< BA)E(e**""; X,_1 > N). 
From this we find that 
Bn(A) = Ee**" < E(e*%n-1 +5): X,_ 1 < N) + E(e**"; X,_1 > N) 
oN wa) + BA)E(e*"!; Xp-1 > N) 


IA 


< YH) Pn S NBO) + BO Pn) 
r®) < 8") oa) + aaa, 
= Ba) 


Note that Theorem 13.7.2 and Corollary 13.7.1 on “positive recurrence” can also 
be referred to as theorems on boundedness of stochastic sequences. 


Chapter 16 
Stationary Sequences 


Abstract Section 16.1 contains the definitions and a discussion of the concepts 
of strictly stationary sequences and measure preserving transformations. It also 
presents Poincaré’s theorem on the number of visits to a given set by a stationary se- 
quence. Section 16.2 discusses invariance, ergodicity, mixing and weak dependence. 
The Birkhoff—Khintchin ergodic theorem is stated and proved in Sect. 16.3. 


16.1 Basic Notions 


Let (2, %,P) be a probability space and & = (&, &,,...) an infinite sequence of 
random variables given on it. 


Definition 16.1.1 A sequence é is said to be strictly stationary if, for any k, the 
distribution of the vector (&),...,&:4+%) does not depend on n,n > 0. 


Along with the sequence &, consider the sequence (&, 41, ...). Since the finite- 
dimensional distributions of these sequences (i.e. the distributions of the vectors 
(Em, ---» &m+k)) coincide, the distributions of the sequences will also coincide (one 
has to make use of the measure extension theorem (see Appendix 1) or the Kol- 
mogorov theorem (see Sect. 3.5). In other words, for a stationary sequence &, for 
any n and B € S° (for notation see Sect. 3.5), one has 


PE € B) = P((En, nti, ---) € B). 


The simplest example of a stationary sequence is given by a sequence of inde- 
pendent identically distributed random variables ¢ = (fo, ¢1,...). It is evident that 
the sequence & = ag, +--+ Oso%45, k =0, 1,2,..., will also be stationary, but 
the variables &; will no longer be independent. The same holds for sequences of the 
form 


[o,@) 
i= eae 
j=0 
provided that E|¢;| < 00, ¥° |aj| < 00, or if Et, = 0, Var(t.) < 00, dak < 00 (the 


latter ensures a.s. convergence of the series of random variables, see Sect. 10.2). Ina 


A.A. Borovkov, Probability Theory, Universitext, 493 
DOI 10.1007/978-1-4471-5201-9_16, © Springer-Verlag London 2013 


494 16 Stationary Sequences 


similar way one can consider stationary sequences & = g (fx, x41, ---) “generated” 
by ¢, where g(x) is an arbitrary measurable functional R® +> R. 

Another example is given by stationary Markov chains. If {X,} is a real-valued 
Markov chain with invariant measure w and transition probability P(-,-) then the 
chain {X;,} with X @ a will form a stationary sequence, because the distribution 


P(X € Bo... Xnsk € B= f n(dxo) | Pexoedxs)- | P(xx-1, dX) 
Bo By Bx 


will not depend on n. 

Any stationary sequence & = (&, &), ...) can always be extended to a stationary 
sequence é = (...& 1, &, €1,...) given on the “whole axis”. 

Indeed, for any n, —co <n < ow, and k > 0 define the joint distributions of 
(n,---,&€n+x) as those of (&, ..., &). These distributions will clearly be consistent 
(see Sect. 3.5) and by the Kolmogorov theorem there will exist a unique probabil- 
ity distribution on R®, = []~2_,. Re with the respective o-algebra such that any 
finite-dimensional distribution is a projection of that distribution on the correspond- 
ing subspace. It remains to take the random element € to be the identity mapping 
of R°&,, onto itself. 

In some of the subsequent sections it will be convenient for us to use stationary 
sequences given on the whole axis. 

Let & be such a sequence. Define a transformation 6 of the space R°,, onto itself 
with the help of the relations 


(OX)k = (X)e+1 = Xk41, (16.1.1) 


where (x), is the k-th component of the vector x € R®,, —oo < k < oo. The trans- 
formation @ clearly has the following properties: 
1. It is a one-to-one mapping, 97! is defined by 


(7 'x), = XEAls 
2. The sequence 6& is also stationary, its distribution coinciding with that of &: 
P(OE € B) = P(E EB). 


It is natural to call the last property of the transformation @ the “measure preserv- 
ing” property. 

The above remarks explain to some extent why historically exploring the prop- 
erties of stationary sequences followed the route of studying measure preserving 
transforms. Studies in that area constitute a substantial part of the modern analysis. 
In what follows, we will relate the construction of stationary sequences to measure 
preserving transformations, and it will be more convenient to regard the latter as 
“primary” objects. 


Definition 16.1.2 Let (92, §, P) be the basic probability space. A transformation T 
of {2 into itself is said to be measure preserving if: 


16.1 Basic Notions 495 


(1) T is measurable, i.e. T~!A = {w: Tw € A} € for any A € $; and 
(2) T preserves the measure: P(T~!A) = P(A) for any Aé §. 


Let T be a measure preserving transformation, 7” its n-th iteration and € = &(w) 
be arandom variable. Put U&(@) = &(Tw), so that U is a transformation of random 
variables, and U*kE(w) -_ &(Tko). Then 


&={U"E(w)}> = {&(T" a) } (16.1.2) 
is a stationary sequence of random variables. 
Proof Indeed, let A = {w; € € B}, Be B™ and A; = {w: O& € B}. We have 
&=(&(w),§(To),...), 08 = (§(To), §(T*e),...). 


Therefore w € A, if and only if Tw € A, i.e. when Aj = T~!A. But P(T~!A) = 
P(A) and hence P(A) = P(A), so that P(A,;,) = P(A) for any n > | as well, where 
An ={w:0"& € B}. 


Stationary sequences defined by (16.1.2) will be referred to as sequences gener- 
ated by the transformation T . 

To be able to construct stationary sequences on the whole axis, we will need mea- 
sure preserving transformations acting both in “positive” and “negative” directions. 


Definition 16.1.3 A transformation T is said to be bidirectional measure preserving 
if: 
(1) T is a one-to-one transformation, the domain and range of T coincide with the 


whole 2; 
(2) the transformations T and T—! are measurable, i.e. 


T'A={o:Twe A} EF, TA={Tw: we AEF 


for any A € §; 
(3) the transformation T preserves the measure: P(T—!A) = P(A), and therefore 
P(A) = P(T A) for any A € §. 


For such transformations we can, as before, construct stationary sequences & 
defined on the whole axis: 


§={U"5@)}"., = 8(T"o) |... 

The argument before Definition 16.1.2 shows that this approach “exhausts” all 
stationary sequences given on ({2, §, P), i.e. to any stationary sequence € we can 
relate a measure preserving transformation T and a random variable € = & such 
that &(w) = &o(T*a). In this construction, we consider the “sample probability 
space” (IR, 8°, P) for which &(@) = w, 06 = T. The transformation 6 = T (that is, 


496 16 Stationary Sequences 


transformation (16.1.1)) will be called the pathwise shift transformation. It always 
exists and “generates” any stationary sequence. 

Now we will give some simpler examples of (bidirectional) measure preserving 
transformations. 


Example 16.1.1 Let 2 = {a 1,...,@a}, d => 2, be a finite set, ¥ be the o-algebra of 
all its subsets, Tw; = @j41, 1 <i <d—1 and Tag = a. If P(@;) = 1/d then T 
and T~! are measure preserving transformations. 


Example 16.1.2 Let 2 = [0, 1), ¥ be the o-algebra of Borel sets, P the Lebesgue 
measure and s a fixed number. Then Tw = w+ 5 (mod 1) is a bidirectional measure 
preserving transformation. 


In these examples, the spaces §2 are rather small, which allows one to construct 
on them only stationary sequences with deterministic or almost deterministic de- 
pendence between their elements. If we choose in Example 16.1.1 the variable € so 
that all €(w;) are different, then the value &(w) = & (Tk) will uniquely determine 
Ta and thereby T*+1@ and &41(@). The same can be said of Example 16.1.2 in 
the case when &(w), w € [0, 1), is a monotone function of w. 

As our argument at the beginning of the section shows, the space 2 = R°™ is 
large enough to construct on it any stationary sequence. 

Thus, we see that the concept of a measure preserving transformation arises in 
a natural way when studying stationary processes. But not only in that case. It also 
arises, for instance, while studying the dynamics of some physical systems. Indeed, 
the whole above argument remains valid if we consider on (§2, ) an arbitrary mea- 
sure ml instead of the probability P. For example, for 2 = R®™, the value w(A), 
A € §, could be the Lebesgue measure (volume) of the set A. The measure preserv- 
ing property of the transformation T will mean that any set A, after the transform T 
has acted on it (which, say, corresponds to the change of the physical system’s state 
in one unit of time), will retain its volume. This property is rather natural for incom- 
pressible liquids. Many laws to be established below will be equally applicable to 
such physical systems. 

Returning to probabilistic models, i.e. to the case when the measure is a proba- 
bility distribution, it turns out that, in that case, for any set A with P(A) > 0, the 
“trajectory” T”q@ will visit A infinitely often for almost all (with respect to the mea- 
sure P) we A. 


Theorem 16.1.1 (Poincaré) Let T be a measure preserving transformation and 
A € §. Then, for almost all w € A, the relation T"w € A holds for infinitely many 
n> 1. 


Proof Put N:={weEA:T"w¢ A foralln => 1}. Because {w: T"w € A} € G, it is 
not hard to see that N € §. Clearly, NA T~"N = © for any n> 1, and T™"’ NN 
T-™+") N =T-"™(N TN) = @. This means that we have infinitely many sets 
T-"N,n=0,1,2,..., which are disjoint and have one and the same probability. 
This evidently implies that P(V) = 0. 


16.2 Ergodicity (Metric Transitivity), Mixing and Weak Dependence 497 


Thus, for each w € A \ N, there exists an nj =1,(@) such that T"!w € A. Now 
we apply this assertion to the measure preserving mapping 7 = T*, k > 1. Then, for 
each w € A \ Nx, P(Nx) = 0, there exists an ng = ng(w) > 1 such that (T*)"*w € A. 
Since knx => k, the theorem is proved. 


Corollary 16.1.1 Let E(w) > 0 and A= {w: €(@) > 0}. Then, for almost all w € A, 


(ee) 


y\é(T"a) =o. 


n=0 


Proof Put Ay = {@:&(@) => 1/k} C A. Then by Theorem 16.1.1 the above series 
diverges for almost all w € Ax. It remains to notice that A = |) ; Ak: 


Remark 16.1.1 Formally, one does not need condition P(A) > 0 in Theorem 16.1.1 
and Corollary 16.1.1. However, in the absence of that condition, the assertions may 
become meaningless, since the set A \ N in the proof of Theorem 16.1.1 can turn out 
to be empty. Suppose, for example, that in the conditions of Example 16.1.2, A isa 
one-point set: A = {w}, w € [0, 1). If s is irrational, then T*w will never be in A for 
k > 1. Indeed, if we assume the contrary, then we will infer that there exist integers 
k and m such that w + sk —m =a, s =m/k, which contradicts the irrationality 
of s. 


16.2 Ergodicity (Metric Transitivity), Mixing and Weak 
Dependence 


Definition 16.2.1 A set A € ¥§ is said to be invariant (with respect to a measure 
preserving transformation T) if T~'!A = A. A set A € % is said to be almost in- 
variant if the sets T~'A and A differ from each other by a set of probability zero: 
P(A @ T~!A) =0, where A @ B = AB U AB is the symmetric difference. 


It is evident that the class of all invariant (almost invariant) sets forms a o-algebra 
which will be denoted by 3 (3*). 


Lemma 16.2.1 Jf A is an almost invariant set then there exists an invariant set B 
such that P(A ® B) = 0. 


Proof Put B =limsup,,,. 7~"A (recall that lim sup, 5 An = (po Ue, Ak is 
the set of all points which belong to infinitely many sets A;). Then 


T~'B=limsupT~"t A=B, 


n> oo 


498 16 Stationary Sequences 


i.e. B € J. It is not hard to see that 
[o.@) 
A@Bc| JT *A@r YA). 
k=0 
Since 


P(T AST“) A) =P(A@T'A) =0, 


we have P(A @ B) = 0. The lemma is proved. 


Definition 16.2.2 A measure preserving transformation T is said to be ergodic (or 
metric transitive) if each invariant set has probability zero or one. 


A stationary sequence {&} associated with such T (i.e. the sequence which gen- 
erated T or was generated by T) is also said to be ergodic (metric transitive). 


Lemma 16.2.2 A transformation T is ergodic if and only if each almost invariant 
set has probability 0 or 1. 


Proof Let T be ergodic and A € 3*. Then by Lemma 16.2.1 there exists an invariant 
set B such that P(A @ B) = 0. Because P(B) = 0 or 1, the probability P(A) = 0 or 1. 
The converse assertion is obvious. 


Definition 16.2.3 A random variable ¢ = ¢(w) is said to be invariant (almost in- 
variant) if €(@) = €(Tq@) for all w € 2 (for almost all @ € 22). 


Theorem 16.2.1 Let T be a measure preserving transformation. The following 
three conditions are equivalent: 


(1) T is ergodic; 
(2) each almost invariant random variable is a.s. constant; 
(3) each invariant random variable is a.s. constant. 


Proof (1) => (2). Assume that T is ergodic and & is almost invariant, i.e. €(@) = 
&(Tq@) a.s. Then, for any v € R, we have Ay := {w: E(w) < v} € 3* and, by 
Lemma 16.2.2, P(A,) equals 0 or 1. Put V := sup{v : P(A,) = O}. Since A, t 2 as 
v to and A, J @ as v | —ow, one has |V| < o6 and 


P(E(w) < V) -»(U fr 2V = ‘}) =0. 


n=1 


Similarly, P(§(@) > V) = 0. Therefore P(é(@) = V) = 1. 
(2) => (3). Obvious. 
(3) = (1). Let A € 3. Then the indicator function I, is an invariant random 
variable, and since it is constant, one has either [4 = 0 or I4 = 1 a.s. This implies 
that P(A) equals 0 or 1. The theorem is proved. 


16.2 Ergodicity (Metric Transitivity), Mixing and Weak Dependence 499 


The assertion of the theorem clearly remains valid if one considers in (3) only 
bounded random variables. Moreover, if € is invariant, then the truncated variable 
&w) = min(&, N) is also invariant. 

Returning to Examples 16.1.1 and 16.1.2, in Example 16.1.1, 


92 = (1,...,@d), T @; = @j+1 (mod a), P(a;) = 1/d. 


The transformation T is obviously metric transitive. 
In Example 16.1.2, 2 =[0,1), Tw =w+s (mod 1), and P is the Lebesgue 
measure. We will now show that T is ergodic if and only if s is irrational. 
Consider a square integrable random variable € = &(w) : E&é 2(w) < oo. Then by 
the Parseval equality, the Fourier series 


oo 
E(w) = PO a 
n=0 


for this function has the property )-~ 1 Ic? < oo. Assume that s is irrational, while 
€ is invariant. Then 


an — EE (w)e~ 271" — E£(Tw)e 27"T 
= oR Tae oe = go Beer = ge ae 


For irrational s, this equality is only possible when a, = 0, n > 1, and €(@) = ap = 
const. By Theorem 16.2.1 this means that T is ergodic. 
Now let s = m/n be rational (m and n are integers). Then the set 


UI 2k a1 
A= ce 
k=0 


ne 2n 


will be invariant and P(A) = 1/2. This means that T is not ergodic. 


Definition 16.2.4 A measure preserving transformation T is called mixing if, for 
any Aj, A2€ §,asn> ~w, 


P(A, 0 T7"Az) > P(A1)P(A2). (16.2.1) 


Now consider the stationary sequence & = (&p, £1, ...) generated by the transfor- 
mation T : &(w) = €)(T*w). 


Definition 16.2.5 A stationary sequence & is said to be weakly dependent if & and 
Ex4n are asymptotically independent as n — oo, 1.e. for any By, By € B 


P(E € Bi, Ek4+n € Br) > PE € Bi) P(Eo € Bz). (16.2.2) 


Theorem 16.2.2 A measure preserving transformation T is mixing if and only if 
any stationary sequence & generated by T is weakly dependent. 


500 16 Stationary Sequences 
Proof Let T be mixing. Put A; := & | (Bi), i = 1,2, and set k = 0 in (16.2.2). Then 
P(Eo € Bi, &) € Bo) = P(A1 NT" Az) > P(A1)P(A2). 


Now assume any sequence generated by T is weakly dependent. For any given 
A, A2 € §, define the random variable 


0 ifw¢ A, U Ap; 

1 if @ € A, Ad; 
&(@) = ; 

2 ifweA,Ag; 

3 if w € A, Ad; 


and put &(@) := &(Tko). Then, as n > co, 
P(A, fa) T~" Ap) = POO < £9 <3, & > 2) > POO < & < 3)P(& > 2) 
= P(A))P(A2). 


The theorem is proved. 


Let {X;,} be a stationary real-valued Markov chain with an invariant distribution 
a that satisfies the conditions of the ergodic theorem, i.e. such that, for any B € B 
and x € R,asn > ov, 


P(X, € B| Xp =x) > a(B). 


Then {X,,} is weakly dependent, and therefore, by Theorem 16.2.2, the respective 
transformation T is mixing. Indeed, 


P(X0 € Bi, Xn € Bz) = El(Xo € By) P(Xn € Bz | Xo), 


where the last factor converges to m(B2) for each Xo. Therefore the above proba- 
bility tends to w(B2)m(B}). 

Further characterisations of the mixing property will be given in Theorems 16.2.4 
and 16.2.5. 

Now we will introduce some notions that are somewhat broader than those from 
Definitions 16.2.4 and 16.2.5. 


Definition 16.2.6 A transformation T is called mixing on the average if, asn > 00, 


ie : 
= SY >P(Ai NT~* Az) + P(A1)P(A2). (16.2.3) 
o k=1 

A stationary sequence & is said to be weakly dependent on the average if 


n 


1 
h Y= PeEo € Bi, & € Br) > P(Eo € Bi) P(Eo € Ba). (16.2.4) 
ks] 


16.2 Ergodicity (Metric Transitivity), Mixing and Weak Dependence 501 


Theorem 16.2.3 A measure preserving transformation T is mixing on the average 
if and only if any stationary sequence & generated by T is weakly dependent on the 
average. 


The Proof is the same as for Theorem 16.2.2, and is left to the reader. 


If {X,,} is a periodic real-valued Markov chain with period d such that each 
of the embedded sub-chains {Xitna}P 9, i=0,...,d—1, satisfies the ergodicity 
conditions with invariant distributions 2“ on disjoint sets Xp,..., Xq_1, then the 
“common” invariant distribution x will be equal to d~! a x), and the chain 
{X,} will be weakly dependent on the average. At the same time, it will clearly not 


be weakly dependent for d > 1. 


Theorem 16.2.4 A measure preserving transformation T is ergodic if and only if it 
is mixing on the average. 


Proof Let T be mixing on the average, and A, € §, A2 € J. Then Az = TA 
and hence P(A, T-* Ap) = P(A,A2) for all k > 1. Therefore, (16.2.3) means that 
P(A, A2) = P(A1)P(A2). For Aj = A2 we get P(A2) = P?(A2), and consequently 
P(A2) equals 0 or 1. 

We postpone the proof of the converse assertion until the next section. 


Now we will give one more important property of ergodic transforms. 


Theorem 16.2.5 A measure preserving transformation T is ergodic if and only if, 
for any A € § with P(A) > 0, one has 


r( U rea) =, (16.2.5) 


n=0 


Note that property (16.2.5) means that the sets T~"A, n =0,1,..., “exhaust” 
the whole space §2, which associates well with the term “mixing”. 


Proof Let T be ergodic. Put B := (Jo, 77" A. Then T~'B CB. Because T is 
measure preserving, one also has that P(T~!B) = P(B). From this it follows that 
T~'B =B up toaset of measure 0 and therefore B is almost invariant. Since T is 
ergodic, P(B) equals 0 or 1. But P(B) > P(A) > 0, and hence P(B) = 1. 

Conversely, if T is not ergodic, then there exists an invariant set A such that 
0 < P(A) < 1 and, therefore, for this set T~” A = A holds and 


P(B) = P(A) <1. 


The theorem is proved. 


502 16 Stationary Sequences 


Remark 16.2.1 In Sects. 16.1 and 16.2 we tacitly or explicitly assumed (mainly for 
the sake of simplicity of the exposition) that the components &; of the stationary 
sequence & are real. However, we never actually used this, and so we could, as 
we did while studying Markov chains, assume that the state space 1, in which 
& take their values, is an arbitrary measurable space. In the next section we will 
substantially use the fact that &% are real- or vector-valued. 


16.3 The Ergodic Theorem 
For a sequence &), &2,... of independent identically distributed random variables 
we proved in Chap. 11 the strong law of large numbers: 


-1 
= —> Eg,, where Sn =o &. 


k=0 


Now we will prove the same assertion under much broader assumptions—for sta- 
tionary ergodic sequences, i.e. for sequences that are weakly dependent on the aver- 
age. 

Let {&} be an arbitrary strictly stationary sequence, T be the associated measure 
preserving transformation, and J be the o-algebra of invariant sets. 


Theorem 16.3.1 (Birkhoff—Khintchin) [f E|&| < oo then 


n—1 


1 as, 
oD = E(£ | 3). (16.3.1) 
k=0 


If the sequence {&} (or transformation T ) is ergodic, then 
1S, as, 
. Sok =“s Ef. (16.3.2) 
k=0 


Below we will be using the representation &% = & (T*w) for & = &. We will need 
the following auxiliary result. 


Lemma 16.3.1 Set 
n—-1| 


Sn(@) = )°&(T*o), —— Mg(w) = max{0, $1(@), ..., Se(@)}. 
k=0 


Then, under the conditions of Theorem 16.3.1, 
E[E(o) 1pm, >0)(@)] = 0 


foranyn> 1. 


16.3 The Ergodic Theorem 503 
Proof For all k <n, one has 8;(T7@) < M,(To), and hence 

E(w) + Mn (To) > &(@) + Sk(To) = Sk41(@). 
Because &(w) > S,(@) — M;,(Ta@), we have 

&(w) > max{max(S}(@),..., Sn(@)} — Mn(To). 

Further, since 

{Mn (w) > 0} = {max(S}(), ..., Sn(@)) > 0}, 
we obtain that 


E[é(@)I(m, +0} (@)] = E(max(S1(@), ..., Sa(@)) — Mn(T@)) Ita, +0) (@) 
> E(Mn() — Mn(T)) tu, +0) (@) 
> E(M,(@) — M,(T)) = 0. 


The lemma is proved. 


Proof of Theorem 16.3.1 Assertion (16.3.2) is an evident consequence of (16.3.1), 
because, for ergodic T, the o-algebra 3 is trivial and E(é|3) = EEé a.s. Hence, it 
suffices to prove (16.3.1). 

Without loss of generality, we can assume that E(é|3) = 0, for one can always 
consider & — E(&|3) instead of &. 

Let S := lim SUPy +00 n—'S, and S := liminfy—oo n—!S,. To prove the theorem, 
it suffices to establish that 


0=S<8=0 as. (16.3.3) 


Since S(@) = S(T), the random variable S is invariant and hence the set A — ¢ = 
{S(@) > €} is also invariant for any ¢ > 0. Introduce the variables 


E*(w) = (E(@) — €)I4, (©), 
Si(o) = &*(@) +--+. +E*(T 10), 
Mj (w) := max(0, Sf, ..., Sf). 
Then, by Lemma 16.3.1, for any n > 1, one has 
Bé*Ipux>o} = 0. 
But, as n > oo, 


{Mm > 0} =| max se > 0} t | supsz >of 


1l<k<n k>1 


504 16 Stationary Sequences 
ee ieee ee 
The last equality follows from the observation that 
Ag ={S>e}C {sep el. 
Further, E|é*| < E|é| + e. Hence, by the dominated convergence theorem, 

0 < E&*Imx+0} > Eé*Ia,. 

Consequently, 
0 <Bé*Iy4, =E — e)I4, = Ely, — eP(Ae) 
= Ely, E€ | 5) — eP(Az) = —eP(Ap). 


This implies that P(A,) = 0 for any ¢ > 0, and therefore P(S <0) =1. 
In a similar way, considering the variables —é instead of €, we obtain that 


5; S, 
timsup( +) = —liminf * = -§, 
n 


noo n>o n a 


and P(—S <0) = 1, P(S > 0) = 1. The required inequalities (16.3.3), and therefore 
the theorem itself, are proved. 


Now we can complete the 


Proof of Theorem 16.2.4 It remains to show that the ergodicity of T implies mixing 
on the average. Indeed, let T be ergodic and A;, Az € §. Then, by Theorem 16.3.1, 
we have 


bn = — YT‘ Ar) > P(A), Arba > W(AL)PCA2). 
k=1 


Since ¢,I(A;) are bounded, one also has the convergence 
Eg, 1(A1) > P(A2)- P(A1). 


Therefore 

1 n 
= SY >P(Al fa) T~* Ap) = EI(A1)&) @ P(A1)P(A2). 
k=1 


The theorem is proved. 


Now we will show that convergence in mean also holds in (16.3.1) and (16.3.2). 


16.3. The Ergodic Theorem 505 


Theorem 16.3.2. Under the assumptions of Theorem 16.3.1, one has along with 
(16.3.1) and (16.3.2) that, respectively, 


1 n-1 
B| fk — BG >0 (16.3.4) 
n 
k=0 
and 
1 n—-1 
E _ 
a Eéo| > 0 (16.3.5) 
k=0 
asn > @®. 


Proof The assertion of the theorem follows in an obvious way from Theo- 
rems 16.3.1, 6.1.7 and the uniform integrability of the sums 


(oS 
poe 


which follows from Theorem 6.1.6. 


Corollary 16.3.1 [f {&} is a stationary metric transitive sequence and a = E&, < 0, 
then S(@) = supz>9 Sx(@) is a proper random variable. 


The proof is obvious since, for 0 < ¢ < —a, one has Sx < (a+ €)k < 0 for all 
k>n(@) <o. 

An unusual feature of Theorem 16.3.1 when compared with the strong law of 
large numbers from Chap. 11 is that the limit of 


ae 
nk 


can be a random variable. For instance, let Tw, := wg+2 and d = 21 be even in the 
situation of Example 16.1.1. Then the transformation T will not be ergodic, since 
the set A = {@1, @3,..., @a—1} will be invariant, while P(A) = 1/2. 

On the other hand, it is evident that, for any function (@), the sum 


i= 


- yé(T*o) 


k=0 


will converge with probability 1/2 to 


5 tcl 
7 Yo &(@2j41) 
j=0 


506 16 Stationary Sequences 


(if @ = @; andi is odd) and with probability 1/2 to 


eat 
, EC) 
j=1 


(if @ = w; and i is even). This limiting distribution is just the distribution of E(é|3). 


Chapter 17 
Stochastic Recursive Sequences 


Abstract The chapter begins with introducing the concept of stochastic random se- 
quences in Sect. 17.1. The idea of renovating events together with the key results 
on ergodicity of stochastic random sequences and the boundedness thereof is pre- 
sented in Sect. 17.2, whereas the Loynes ergodic theorem for the case of monotone 
functions specifying the recursion is proved in Sect. 17.3. Section 17.4 establishes 
ergodicity conditions for contracting in mean Lipschitz transformations. 


17.1 Basic Concepts 


Consider two measurable state spaces (X, Bx) and (Y, By), and let {&,} be a 
sequence of random elements taking values in Y. If (2, §,P) is the underlying 
probability space, then {w: & € B} € § for any B € By. Assume, moreover, 
that a measurable function f: ¥ x Y— /# is given on the measurable space 
(X x Y, By x By), where By x By denotes the o-algebra generated by sets 
Ax Bwith Ac By and Be By. 

For simplicity’s sake, by ¥ and Y we can understand the real line R, and by Bx, 
Sy the o-algebras of Borel sets. 


Definition 17.1.1 A sequence {X,},n =0,1,..., taking values in (X, Bx) is said 
to be a stochastic recursive sequence (s.r.s.) driven by the sequence {&,} if X,, satis- 
fies the relation 


Xnt+l —_ SF (Xn, En) (17.1.1) 


for all n > 0. For simplicity’s sake we will assume that the initial state Xo is inde- 
pendent of {&,}. 


The distribution of the sequence {X,, &,} on ((¥ x Y)™, (Bx x By)) can be 
constructed in an obvious way from finite-dimensional distributions similarly to the 
manner in which we constructed on (4°, 8%) the distribution of a Markov chain X 
with values in (X, 8) from its transition function P(x, B) = P(X,(x) € B). The 
finite-dimensional distributions of {(Xo, 0), ..., (Xx, &)} for the s.r.s. are given by 
the relations 


A.A. Borovkov, Probability Theory, Universitext, 507 
DOI 10.1007/978-1-4471-5201-9_17, © Springer-Verlag London 2013 


508 17 Stochastic Recursive Sequences 


P(X, € Aj, & € By 1=0,...,4) 
k 
-|/ -f P(& dy, 1=0,...,k) | [1(fi(Xo. yo. ---. 0) € Ad), 
By > Be i=l 


where f(x, yo) = f(x, yo), fi, yo... = f(fi-1@, yo. -.., W-1)5 YN): 

Without loss of generality, the sequence {&,,} can be assumed to be given for all 
—0oo <n < © (as we noted in Sect. 16.1, for a stationary sequence, the required 
extension to n < 0 can always be achieved with the help of Kolmogorov’s theorem). 

A stochastic recursive sequence is a more general object than a Markov chain. It 
is evident that if & are independent, then the X,, form a Markov chain. A stronger 
assertion is true as well: under broad assumptions about the space (1, 8), for any 
Markov chain {X,,} in (X, 8x) one can construct a function f and a sequence of 
independent identically distributed random variables {&,} such that (17.1.1) holds. 
We will elucidate this statement in the simplest case when both ¥ and Y coincide 
with the real line R. Let P(x, B), B € 8, be the transition function of the chain 
{X,}, and F(t) = P(x, (—oo, r)) the distribution function of X1 (x) (Xo = x). Then 
if F (t) is the function inverse (in t) to F(t) and a & Up, is a random variable, 
then, as we saw before (see e.g. Sect. 6.2), the random variable F- '(@) will have the 
distribution function F’,(t). Therefore, if {a,} is a sequence of independent random 
variables uniformly distributed over [0, 1], then the sequence X,41 = F X, : (a,) will 
have the same distribution as the original chain {X,,}. Thus the Markov chain is an 
s.r.s. with the function f (x, y) = Fo !(y) and driving sequence {an}, &, € Uo,1. 

For more general state spaces 7, a similar construction is possible if the o- 
algebra 8 y is countably-generated (i.e. is generated by a countable collection of 
sets from 4). This is always the case for Borel o-algebras in ¥ = R¢, d > 1 (see 
[22)]). 

One can always consider f(-, &,) as a sequence of random mappings of the space 
& into itself. The principal problem we will be interested in is again (as in Chap. 13) 
that of the existence of the limiting distribution of X, asn > o. 

In the following sections we will consider three basic approaches to this problem. 


17.2 Ergodicity and Renovating Events. Boundedness 
Conditions 


17.2.1 Ergodicity of Stochastic Recursive Sequences 


We introduce the o-algebras 


Bin = Ole Sk <n}, 


B= olf k <n} =F con 


17.2 Ergodicity and Renovating Events. Boundedness Conditions 509 


BF :=0{&; —00 <k < 00} = Foo 00: 


In the sequel, for the sake of definiteness and simplicity, we will assume the initial 
value Xo to be constant unless otherwise stated. 


Definition 17.2.1 An event A € ¥ 4mm = 0, is said to be renovating for the s.r.s. 
{X,,} on the segment [n, n +m] if there exists a measurable function g : yt x 


such that, on the set A (i.e. for w € A), 


Xntm+l = 8(En, +++, En4m)- (17.2.1) 


It is evident that, for w € A, relations of the form Xn4m+e41 = gk (En, ---, &nt-m+k) 
will hold for all k > 0, where gx is a function depending on its arguments only and 
determined by the event A. 


The sequence of events {An}, An © &n4m, Where the integer m is fixed, is said 
to be renovating for the s.r.s. {X,} if there exists an integer no > O such that, for 
n > no, one has relation (17.2.1) for w@ € An, the function g being common for all n. 

We will be mainly interested in “positive” renovating events, i.e. renovating 
events having positive probabilities P(A,,) > 0. 

The simplest example of a renovating event is the hitting by the sequence X,, of 
a fixed point x9: Ay = {X;, = xo} (here m = 0), although such an event could be of 
zero probability. Below we will consider a more interesting example. 

The motivation behind the introduction of renovating events is as follows. After 
the trajectory {X;, &&}, k <n-+m, has entered a renovating set A € x 4m: the future 
evolution of the process will not depend on the values {X;}, k <n +m, but will be 
determined by the values of &, +1, ... only. It is not a complete “regeneration” of 
the process which we dealt with in Chap. 13 while studying Markov chains (first of 
all, because the & are now, generally speaking, dependent), but it still enables us 
to establish ergodicity of the sequence X,, (in approximately the same sense as in 
Chap. 13). 

Note that, generally speaking, the event A and hence the function g may depend 
on the initial value Xo. If Xo is random then a renovating event is to be taken from 
the o-algebra caren x 0 (Xo). 

In what follows it will be assumed that the sequence {&,} is stationary. The sym- 
bol U will denote the measure preserving shift transformation of §° -measurable ran- 
dom variables generated by {&,}, so that UE, = &)41, and the symbol T will denote 
the shift transformation of sets (events) from the o-algebra Be: €n41(@) = En (To). 
The symbols U” and J”, n > 0, will denote the powers (iterations) of these transfor- 
mations respectively (so that U! = U, T! =T; U® and T° are identity transforma- 
tions), while U~” and T~” are transformations inverse to U” and 7”, respectively. 

A sequence of events {Ax} is said to be stationary if Ax = T* Ao for all k. 


Example 17.2.1 Consider a real-valued sequence 


Xn41=(Xnté)t, Xo=const >0, n>=0, (17.2.2) 


510 17 Stochastic Recursive Sequences 


where x* = max(0, x) and {&,} is a stationary metric transitive sequence. As we 
already know from Sect. 12.4, the sequence {X,} describes the dynamics of waiting 
times for customers in a single-channel service system. The difference is that in 
Sect. 12.4 the initial value has subscript 1 rather than 0, and that now the sequence 
{&,} has a more general nature. Furthermore, it was established in Sect. 12.4 that 
Eq. (17.2.2) has the solution 


Xn41 = max(Syn, Xo+ Sn), (17.2.3) 


where 


n n 
Sn i= y Ex, Snjki= max Sn, js Sn, j = 5 Ex, Sn,-1 = 0 
—I<jsk : 
k=0 , k=n—j 


(17.2.4) 
(certain changes in the subscripts in comparison to (17.2.4) are caused by different 
indexing of the initial values). From representation (17.2.3) one can see that the 
event 


Bn = {Xo + Sn <9, Sea =O}e e 


implies the event {X,+1 = 0} and so is renovating for m = 0, g(y) = 0. If Xn+1 =0 
then 


+ 
Xn42 = 81(En, Engi) = ane >» Xn43 = 82(En, Engi, En42) = (eh 4 a En42) > 


and so on do not depend on Xo. 
Now consider, for some np > 1 and any n > no, the narrower event 


An = {Xo + sup S,,; <0, Snoo = sup Sp, j= 0} 


J=no jz-1 
(we assume that the sequence {&,,} is defined on the whole axis). Clearly, A, C By, C 
{Xn+1 = 0}, so Ay is a renovating event as well. But, unlike B,, the renovating 
event A, is stationary: An = T" Ao. 
We assume now that E&p < 0 and show that in this case P(Ag) > 0 for sufficiently 
large ng. In order to do this, we first establish that P(S0,00 = 0) > 0. Since, by the 
ergodic theorem, So, ; Sy —00 as j > 06, we see that So,o0 is a proper random 


variable and there exists a v such that P(So,o0 < v) > 0. By the total probability 
formula, 


[oe 

0< P(S0,00 <v)= YP(S0.)-1 < So,j <v, sup(So,x — So,j) = 0). 

= k>j 
j=0 ; 


Therefore there exists a j such that 


P(sup(So,x =.= 0) a) 
k>j 


17.2 Ergodicity and Renovating Events. Boundedness Conditions 511 


But the supremum in the last expression has the same distribution as So.o0. This 
proves that p := P(S0,00 = 0) > 0. Next, since So, ; = —oo, one also has 
SUP j>K So, j **s —co ask > 00. Therefore, P(sup js, So,j < —X0) > lask > ov, 
and hence there exists an 9 such that 


P( sup So,; < -Xo) >1- s 


j=N0 


Since P(AB) > P(A) + P(B) — 1 for any events A and B, the aforesaid implies that 
P(Ao) > p/2>0. 

In the assertions below, we will use the existence of stationary renovating events 
A, with P(Ag) > 0 as a condition insuring convergence of the s.r.s. X, to a station- 
ary sequence. However, in the last example such convergence can be established 
directly. Let E&y < 0. Then by (17.2.3), for any fixed v, 


P(Xn4+1 >u= PUSia >u)t+ P(S,4 <v,Xo0+ Sn > v), 


where evidently 
P(X0 + Sp > v) > O,~ Pix >v) t P(S0,00 >v) 
as n — oo. Hence the following limit exists 


lim P(X, > v) = P(So,00 > v). (17.2.5) 
n—->oo 


Recall that in the above example the sequence of events A, becomes renovating 
for n > no. But we can define other renovating events C,, along with a number m 
and function g: R”*+! — R as follows: 


m:=Nno, Cn :=T™An, &(V05-+ +s Yn) = 0. 


The events C;, € x 4m are renovating for {X,} on the segment [n,n + m] for all 
n = 0, so in this case the ng in the definition of a renovating sequence will be equal 
to 0. 

A similar argument can also be used in the general case for arbitrary renovat- 
ing events. Therefore we will assume in the sequel that the number no from the 
definition of renovating events is equal to zero. 


In the general case, the following assertion is valid. 


Theorem 17.2.1 Let {&,} be an arbitrary stationary sequence and for the s.r.s. {Xn} 
there exists a sequence of renovating events {Ay} such that 


n 
r( U aA] >1 asn>0o (17.2.6) 
j=l 


512 17 Stochastic Recursive Sequences 


uniformly in s > 1. Then one can define, on a common probability space with {Xj}, 
a stationary sequence {X" := U" X°} satisfying the equations X"*! = f (X", En) 
and such that 


P{X, =X* forallk>n}>1 asn>oo. (172.7) 


If the sequence {&,} is metric transitive and the events A, are stationary, then the 
relations P(Ao) > 0 and PU 5 An) = 1 are equivalent and imply (17.2.6) and 
(17.2.7). 


Note also that if we introduce the measure 1(B) = P(X° € B) (as we did in 
Chap. 13), then (17.2.7) will imply convergence in total variation: 


sup P(X; € B)- m(B)| >0 asn>-ow. 
BEBY 


Proof of Theorem 17.2.1 First we show that (17.2.6) implies that 


Co 
r( (\{Xne A uksat >0 asn—>oo (17.2.8) 
k=0 


uniformly in s > 0. For a fixed s > 1, consider the sequence X : =U~*Xs4;. It is 
defined forj > —s, and 


xX? , = X0, Xe od = rican Es) = f (Xo, &_s) 

and so on. It is clear that the event 

{Xj = Xj for some j € [0, n]} 
implies the event 

{Xt+n+k=Xii, for all k > O}. 

We show that 

n 

(Utes -x)) >1 asn>o. 
j=l 


For simplicity’s sake put m = 0. Then, for the event X ;+1 = X j 4, to occur, it suf- 
fices that the events A; and T~*Aj+,; occur simultaneously. In other words, 


n—-1 n lee) 
LU Apt Ajas CLK = 25) Cente = Xa 
j=0 j=l k=0 


Therefore (17.2.6) implies (17.2.8) and convergence 


P(X, AXP) +0 asn>oo 


17.2 Ergodicity and Renovating Events. Boundedness Conditions 513 


uniformly in k > 0 and s > 0. If we introduce the metric p putting p(x, y):= 1 for 
x #y, p(x, x) =0, then the aforesaid means that, for any 5 > 0, there exists an N 
such that 


P(o( Xf. XE") > 8) = P(o(XE. xf) £0) <3 


forn > N andany k > 0,5 >0,ie. X k is a Cauchy sequence with respect to conver- 
gence in probability for each k. Because any space Vv is complete with such a metric, 


there exists a random variable X* such that X k 4, xk asn > 00 (see Lemma 4.2). 
Due to the specific nature of the metric o this means that 


P(X, 4X") +0 asn> ov. (17.2.9) 
The sequence X* is stationary. Indeed, as n > 00, 


P(X"! 4 UX") = P(XZ, | A UX2) + 001) = P(E, # X71) +01) = ol). 


Since the probability P(X‘+! 4 UX*) does not depend on n, X*+! =U X* as. 
Further, Xp4441 = f (Xn+k, &n+x), and therefore 


Xt, =U" f (Xnt ts Ente) = f (XE, Ee). (17.2.10) 


The left and right-hand sides here converge in probability to X*+! and f(X*, &), 
respectively. This means that X*+! = f(X*, &). 

To prove convergence (17.2.7) it suffices to note that, by virtue of (17.2.10), the 
values X; and X k after having become equal for some k, will never be different for 
greater values of k. Therefore, as well as (17.2.9) one has the relation 


e(Upsr #2") =»( Ula ext) +0 asn>oo, 


k>0 k>0 


which is equivalent to (17.2.7). 
The last assertion of the theorem follows from Theorem 16.2.5. The theorem is 
proved. 


Remark 17.2.1 Jt turns out that condition (17.2.6) is also a necessary one for con- 
vergence (17.2.7) (see [6]). For more details on convergence of stochastic recursive 
sequences and their generalisations, and also on the relationship between (17.2.6) 
and conditions (I) and (II) from Chap. 13, see [6]. 


In Example 17.2.1 the sequence X* was actually found in an explicit form (see 
(17.2.3) and (17.2.5)): 


X* = Si -co = sup Si. (17.2.11) 
B keas 


514 17 Stochastic Recursive Sequences 


These random variables are proper by Corollary 16.3.1. It is not hard to also see 
that, for Xo = 0, one has (see (17.2.3)) 


UX et x". (17.2.12) 


17.2.2. Boundedness of Random Sequences 


Consider now conditions of boundedness of an s.r.s. in spaces ¥ = [0,00) and 
X = (—oo, oo). Assertions about boundedness will be stated in terms of existence 
of stationary majorants, i.e. stationary sequences M,, such that 


Xn <M, foralln. 


Results of this kind will be useful for constructing stationary renovating sequences. 

Majorants will be constructed for a class of random sequences more general than 
stochastic recursive sequences. Namely, we will consider the class of random se- 
quences satisfying the inequalities 


Xnti < (Xn thn, &n)) (17.2.13) 


where the measurable function / will in turn be bounded by rather simple functions 
of X;, and &,. The sequence {&,} will be assumed given on the whole axis. 


Theorem 17.2.2 Assume that there exist a number N > 0 and a measurable func- 
tion g; with Eg, (&,) <0 such that (17.2.13) holds with 


ieee ‘bel (17.2.14) 


giiy)+N—-x forx<N. 
If Xo < M <o, then the stationary sequence 


Mn =max(M, N)+ sup Sp-1,;, (17.2.15) 
j2-1 


where Sn,-1 = O.and Sx, j = g1(&k) +---+91(&k-;) for j = 0, is a majorant for Xp. 


Proof For brevity’s sake, put ¢; := g1(&;), Z := max(M, N), and Z, := X, — Z. 
Then Z,, will satisfy the following inequalities: 


(Zn + Z+&n)t-Z< (Zn + &n)t for Z, > N — Z, 


Z, < 
“— (N+on)b —Z<oi for Z, < N — Z. 


Consider now a sequence {Y,,} defined by the relations Yo = 0 and 


Yn+1 = (Yn =F & 


17.2 Ergodicity and Renovating Events. Boundedness Conditions 515 
Assume that Z, < Y,.If Z, > N — Z then 
Znt1<(Znt+bn)t < Yn t+ on)t = Yn. 
If Z, < N — Z then 
Zrii SO = nt ta)” = Yai: 


Because Zp < 0 = Yo, it is evident that Z,, < Y, for all n. But we know the solution 
of the equation for Y,, and, by virtue of (17.2.11) and (17.2.13), 


Xn —Z< sup Sy-1,;- 
jz-l , 


The theorem is proved. 


Theorem 17.2.2A Assume that there exist a number N > 0 and measurable func- 
tions g1 and go such that 


Egi(n) <0, Eg2(&,) <0 (17.2.16) 


and 


ie jel ee: (17.2.17) 


gi(y) + g2(y) forx SN. 
If Zo < M < @, then the conditions of Theorem 17.2.2 are satisfied (possibly for 
other N and g,) and for Xp, there exists a stationary majorant of the form (17.2.15). 


Proof We set g := —Eg (&,) > 0 and find Z > 0 such that E(g2(&,); g2(&,) > L) < 
g/2. Introduce the function 


ty) = 8109) + g2(y)I(g2(y) > L). 


Then Eg} (&,) < —g/2 < 0 and 


h(x, y) S g1(y) + g2QVI@ < N) 
< gi (y) + g2(y) U(x < N) — ga(y)I(g2(y) > L) 
Saf(y) + LIa SN) <gfQ)+(L+N —x)I@ SN) 
<gt(y)+(L+N—x)I(x <L+Q). 


This means that inequalities (17.2.14) hold with N replaced with N* = N 4+ L. 
The theorem is proved. 


Note again that in Theorems 17.2.2 and 17.2.2A we did not assume that {X,,} is 
an S.r.s. 


516 17 Stochastic Recursive Sequences 


The reader will notice the similarity of the conditions of Theorems 17.2.2 and 
17.2.2A to the boundedness condition in Sect. 15.5, Theorem 13.7.3 and Corol- 
lary 13.7.1. 

The form of the assertions of Theorems 17.2.2 and 17.2.2A enables one to con- 
struct stationary renovating events for a rather wide class of nonnegative stochastic 
recursive sequences (so that V = [0, 00)) having, say, a “positive atom” at 0. It is 
convenient to write such sequences in the form 


Xn = (Xn +h(Xn, En)". (17.2.18) 


Example 17.2.2 Let an s.r.s. (see (17.1.1)) be described by Eq. (17.2.18) and satisfy 
conditions (17.2.14) or (17.2.17), where the function h is sufficiently “regular” to 
ensure that 


Bu,t = () {h(t, En) < —t} 


t<T 


is an event for any 7. (For instance, it is enough to require h(t, v) to have at most 
a countable set of discontinuity points t. Then the set B,,7 can be expressed as 
the intersection of countably many events 1) {a(te, —,) < —t,}, where {t,} form 
a countable set dense on [0, T].) Furthermore, let there exist an L > 0 such that 


P(M, < L, Bn.t) > 0 (17.2.19) 


(M,, was defined in (17.2.15)). Then the event A, = {M, < L}B,,1 is clearly a 
positive stationary renovating event with the function g(y) = (A(0, y))*, m = 0. 
(On the set A, € ie we have X41 =0, Xn42 = h(0, Ent? and so on.) Therefore, 
an s.r.s. satisfying (17.2.18) satisfies the conditions of Theorem 17.2.1 and is ergodic 
in the sense of assertion (17.2.7). 


It can happen that, from a point t < L, it would be impossible to reach the 
point 0 in one step, but it could be done in m > | steps. If B is the set of sequences 
(§n,---,&n4m) that effect such a transition, and P(M,, < L), then A, ={M, < L}B 
will also be stationary renovating events. 


17.3 Ergodicity Conditions Related to the Monotonicity of f 


Now we consider ergodicity conditions for stochastic recursive sequences that are 
related to the analytic properties of the function f from (17.1.1). As we already 
noted, the sequence f(x, &), k = 1,2,..., may be considered as a sequence of 
random transformations of the space ¥. Relation (17.1.1) shows that X,+1 is the 
result of the application of n + 1 random transformations f(-, &), k =0,1,...,”, 
to the initial value Xo = x € X’. Denoting by &”** the vector €”+* = (&,,..., En 44) 


17.3 Ergodicity Conditions Related to the Monotonicity of f 517 


and by /f; the k-th iteration of the function f: f{(x, y1) = f(, y1), fo(®, 1, v2) = 
FS (f(@, y1), y2) and so on, we can re-write (17.1.1) for Xo = x in the form 


Xngi = Xn) = pee En). 


so that the “forward” and “backward” equations hold true: 


Fn+a(x, 62) = f(falx. 8%"), &) = fa f@, &), &7). (17.3.1) 


In the present section we will be studying stochastic recursive sequences for 
which the function f from representation (17.1.1) is monotone in the first argu- 
ment. To this end, we need to assume that a partial order relation “>” is defined 
in the space ¥. In the space V = R¢ of vectors x = (x(1),...,x(d)) (or its sub- 
spaces) the order relation can be introduced in a natural way by putting x; > x2 if 
x1(k) => x2(k) for all k. 

Furthermore, we will assume that, for each non-decreasing sequence x1 < x2 < 
+++ <X, <..., there exists a limit x € 4, i.e. the smallest element x € ¥V for which 
xx <x for all k. In that case we will write x, + x or limg.o9 x, =x. In Y= R¢ 
such convergence will mean conventional convergence. To facilitate this, we will 
need to complete the space R@ by adding points with infinite components. 


Theorem 17.3.1 (Loynes) Suppose that the transformation f = f (x, y) and space 
X satisfy the following conditions: 


(1) there exists an xp € & such that f (x9, y) => xo forall y € y; 

(2) the function f is monotone in the first argument: f (x1, y) => f (x2, y) ifx1 = x23 

(3) the function f is continuous in the first argument with respect to the above 
convergence: f (xn, y) t f(x, y) if Xn t x. 


Then there exists a stationary random sequence {Xn} satisfying Eq. (17.1.1): 
X"t+l = UX" = f (X", &,), such that 


U" Xnis(x)) t X* asn>o, (17.3.2) 


where convergence takes place for all elementary outcomes. 


Since the distributions of X, and U~" X,, coincide, in the case where conver- 
gence of random variables 7, 7 means convergence (in a certain sense) of their 
distributions (as is the case when VY = R@), Theorem 17.2.1 also implies conver- 
gence of the distributions of X,, to that of X %asn > oo. 


Remark 17.3.1 A substantial drawback of this theorem is that it holds only for a 
single initial value Xp = xo. This drawback disappears if the point xg is accessible 
with probability 1 from any x € ¥, and &% are independent. In that case x9 is likely 
to be a positive atom, and Theorem 13.6.1 for Markov chains is also applicable. 


518 17 Stochastic Recursive Sequences 


The limiting sequence X° in (17.3.2) can be “improper” (in spaces X = R¢ it 
may assume infinite values). The sequence X* will be proper if the s.r.s. X;, satisfies, 
say, the conditions of the theorems of Sect. 15.5 or the conditions of Theorem 17.2.2. 


Proof of Theorem 17.3.1 Put 


6 = fen SVS “fhasGu.d, ” VSO Kees). 


Here the superscript —k indicates the number of the element of the driving sequence 
{E,}P2_,, such that the elements of this sequence starting from that number are used 
for constructing the s.r.s. The subscript s is the “time epoch” at which we observe 


the value of the s.r.s. From the “backward” equation in (17.3.1) we get that 


vy! = fers (fro, £-4-1),€°9)) > fiers (20, 655) = v7. 


This means that the sequence v,* 


random variable X* € ¥ such that 


increases as k grows, and therefore there exists a 


vy * =U" Xy45(x9) t X* ask 00. 


Further, u* is a function of ai Therefore, X* is a function of eae 


Pace). 
Hence 
US =1ee =Car 


which means that {X*} is stationary. Using the “forward” equation from (17.3.1), 
we obtain that 


—k-1 s—2 —k-1 
Uy SS fete) f 1 E524). 
Passing to the limit as k — oo gives, since f is continuous, that 


xs _ fe: Ee=1). 


The theorem is proved. 


Example 17.2.1 clearly satisfies all the conditions of Theorem 17.3.1 with ¥ = 
[0, 00), x9 = 0, and f(x,y) = («+ y)T. 


17.4 Ergodicity Conditions for Contracting in Mean Lipschitz 
Transformations 


In this section we will assume that ¥ is a complete separable metric space with 
metric o. Consider the following conditions on the iterations X;(x) = fx (x, a a 


17.4 Ergodicity Conditions 519 


Condition (B) (boundedness). For some xo € X and any 6 > 0, there exists an 
N =N; such that, for alln = 1, 


P(0(x0, Xn(xo) > N)) =P(o(x0, fn(xo,&} -)) > N) <6. 


It is not hard to see that condition (B) holds (possibly with a different N’) as soon as 
we can establish that, for some m > 1, the above inequality holds for all n > m. 

Condition (B) is clearly met for stochastic random sequences satisfying the con- 
ditions of Theorems 17.2.2 and 17.2.2A or the theorems of Sect. 15.5. 


Condition (C) (contraction in mean). The function f is continuous in the first 
argument and there exist m > 1, B > 0. and a measurable function q : R” — R such 
that, for any x, and x2, 


p(fn(x1, 8" ')s fm(x2, 0" ')) S485" ')e Or, x2), 
m~'Elng(&""') <-6 <0. 


Observe that conditions (B) and (C) are, generally speaking, not related to each 
other. Let, for instance, ¥ = R, Xo > 0, & => 0, p(x, y) = |x — yl, and f(x, y) = 
bx + y, so that 


Xnt+i = bXp oF En. 


Then condition (C) is clearly satisfied for 0 < b < 1, since 


| fx, y) — f (x2, y)| = blx1 — x21. 


At the same time, condition (B) will be satisfied if and only if E1n 9 < oo. Indeed, if 
EInép = on, then the event {In& > —2k1nb} occurs infinitely often a.s. But Xy+1 
has the same distribution as 


n n 
bX +) bE = b"t! Xo +) exp{k nb + In&}, 
k=0 k=0 


where, in the sum on the right-hand side, the number of terms exceeding exp{—k In b} 


increases unboundedly as n grows. This means that X (n+ 1) +, 00 as n > 00. The 
case Eln&g < oo is treated in a similar way. The fact that (B), generally speaking, 
does not imply (C) is obvious. 

As before, we will assume that the “driving” stationary sequence {&,}0° _., is 
given on the whole axis. Denote by U the respective distribution preserving shift 
operator. 

Convergence in probability and a.s. of a sequence of V-valued random vari- 


ables ny € X (Ny Hy 1, Nn 2 n) is defined in the natural way by the rela- 
tions P(p(jn,n) > 6) > 0 as n > o and P(p(nx, n) > 6 for some k >n) > 0 
as n — oo for any 6 > 0, respectively. 


520 17 Stochastic Recursive Sequences 


Theorem 17.4.1 Assume that conditions (B) and (C) are met. Then there exists a 
stationary sequence {Xp} satisfying (17.1.1): 


xe = Ux Fal Ge En) 
such that, for any fixed x, 
UX ag(t)— > as 0: (17.4.1) 


This convergence is uniform in x over any bounded subset of *. 


Theorem 17.2.2 implies the weak convergence, as n — ov, of the distributions 
of X,(x) to that of X°. Condition (B) is clearly necessary for ergodicity. As the 
example of a generalised autoregressive process below shows, condition (C) is also 
necessary in some cases. 

Set Y,, := U" X,, (x0), where x9 is from condition (B). We will need the following 
auxiliary result. 


Lemma 17.4.1 Assume that conditions (B) and (C) are met and the stationary se- 


quence CLC ees ne is ergodic. Then, for any 5 > 0, there exists an ngs such 


that, for all k => 0, 


sup P(p(Yn+k, Yn) <6 for alln > ns) >1—6. (17.4.2) 
k>0 


For ergodicity fg it suffices that the transformation T™ is met- 
ric transitive. 


The lemma means that, with probability 1, the distance o(Yn+x, Y,) tends to zero 
uniformly in k as n — oo. Relation (17.4.2) can also be written as P(As) < 6, where 


As:= J {oats Yn) = 5}. 


n=>ng 


Proof of Lemma 17.4.1 By virtue of condition (B), there exists an N = Ns such 
that, for all k > 1, 


A] & 


P(o(x0, Xx(x0)) > N) < 
Hence 
P(As) < 5/3 + P(As; P(X0, On,k) < N). 


The random variable 6, , := U —"-k ¥_ (x9) has the same distribution as X;(x0). 
Next, by virtue of (C), 


P(Yn+ks Yn) < P(fn+k (xo, oy) fn (xo, a) 


17.4 Ergodicity Conditions 521 


S 4(E=n) 0 (ntk—m(¥0, Ea) Sn (80> Fn") 
q (En) 0(U-" * Xngk—m (x0), U" Xn—m(x0))- (17.4.3) 


Denote by B, the set of numbers n of the form n =/m+s5,1=0,1,2,..., 
0 <s <™m, and put 


Ap=ing(E 2"), fF =1,2,.... 


jm 
Then, for n € B,, we obtain from (17.4.3) and similar relations that 


i 


POs Yo) ex9| Dr a (U-"* Xis(xo),U-" Xs (x0)), (17.4.4) 
j=1 


where the last factor (denote it just by ~) is bounded from above: 


p< p(xo, U" * Xe4s(x0)) + 0(x0, U-" Xs (x0)). 


The random variables U~" X ; (xo) have the same distribution as X ;(xo). By virtue 
of (B), there exists an N = Ns such that, for all j > 1, 


5 
P(o(x0, Xj(x0)) > N) < aa 


Hence, for all n, k and s, we have P(p > 2N) < 6/(2m), and the right-hand side 
of (17.4.4) does not exceed 2N exp{ <1 A;} on the complement set {p < 2N}. 

Because EA; < —mf < 0 and the sequence {A ;} is metric transitive, by the er- 
godic Theorem 16.3.1 we have 


I 
Say < —mBl/2 


j=l 


for all / > 1(w), where /(@) is a proper random variable. Choose /; and /2 so that the 
inequalities 
l 
I <In’—In2N, — P(() >) < = 


hold. Then, putting 


Ig:s=max(li,l2), — ng:=mls, — AS= (J {ont Yn) = df, 


n>nsg, ne B, 


we obtain that 


3 l 
P(A3) < P(p > 2N) +P(Aj: p SN) < 5 +P(Uf2reo| - Soa] =s}) 


I>Is 


522 17 Stochastic Recursive Sequences 


But the intersection of the events from the term with {l/; > /(@)} is empty. Therefore, 
the former event is a subset of the event {/(w) > ls}, and 


m-1 
, P(As) <)> P(AS) <8. 


s=0 


P(Aj) < 


6 
m 


The lemma is proved. 


Lemma 17.4.2 (Completeness of 4 with respect to convergence in probability) Let 
X be a complete metric space. If a sequence of X-valued random elements ny is 
such that, for any 6 > 0, 


Pr := sup P(0(n+k; Mn) > 6) > 0 
k>0 


as n — OO, then there exists a random element n € X such that n Es n (that is, 
P(p(mn, n) > 5) > Oasn— ov). 


Proof For given ¢ and 6 choose nz, k =0,1,..., such that 
sup P(e (nnz+s, Ming) > 2-*5) < €2*, 
Ss 


and, for the sake of brevity, put ¢; := n,. Consider the set 


[ee 


D:= () Dy, Dg = {@ 0 (Cet, &) <2}. 
k=0 


Then P(D) > 1 — 2¢ and, for any w € D, one has p(fx4.5(@), SK (@)) < 62k—! for 
all s > 1. Hence €%(@) is a Cauchy sequence in ¥ and there exists an 7 = n(w) € V 


such that ¢,(@) — n(q@). Since e is arbitrary, this means that ¢; a nask>o, 
and 


P(0(é0,) > 28) < r( LL) eGeri. Se) > =) 
k=0 


< SO P(o(ce41, &) > 2-8) <2. 
k=0 


Therefore, for any n > no, 


P(0(1n, 7) > 35) < PPCM, Mn) > 8) + P(P (Zo, 0) > 28) < 3e. 


Since ¢ and 6 are arbitrary, the lemma is proved. 


17.4 Ergodicity Conditions 523 


Proof of Theorem 17.4.1 From Lemma 17.4.1 it follows that 


sup P(o(Yn+k, Yn) > 5) > 0 asn— oo. 
k 


This means that Y,, is a Cauchy sequence with respect to convergence in probability, 
and by Lemma 17.4.2 there exists a random variable X° such that 


eee 
(17.4.5) 
U—" Xnas (xo) = US (U-" S$ Xns5(X0)) = US Yngs > USX° = XS. 
By continuity of f, 
uo Xn+s+1(X0) = UF (Kins (xo), En+s) 
= f (U" Xn4s (xo), és) 4, Ate E,) a", 


We proved the required convergence for a fixed initial value xo. For an arbitrary 
x €C, = {z: e(xo, z) < N}, one has 


p(U"Xn(x), X°) < p(U" Xn(x), U" Xn (a0) + 0(U" Xn (20), X°), (17.4.6) 


where the first term on the right-hand side converges in probability to 0 uniformly 
in x € Cy. For n =/m this follows from the inequality (see condition (C)) 


1 
p(U"Xn(x), U-" Xn (x0) zep} D> | (17.4.7) 


and the above argument. Similar relations hold for n =/m+s5,m > 5s > 0. This, 
together with (17.4.5) and (17.4.6), implies that 


U" Xnas (0) > X8 =X 


uniformly in x € Cy. This proves the assertion of the theorem in regard to conver- 
gence in probability. 

We now prove convergence with probability 1. To this end, one should repeat 
the argument proving Lemma 17.4.1, but bounding p(X 9 U-"X,,(x)) rather than 
P(Yn+k, Yn). Assuming for simplicity’s sake that s = 0 (n is a multiple of m), we 
get (similarly to (17.4.4)) that, for any x, 


I 
p(X°,U-" Xn (x)) < p(x, U-" X°) vo mat (17.4.8) 


j=l 


The rest of the argument of Lemma 17.4.1 remains unchanged. This implies that, 
for any 6 > 0 and sufficiently large ns, 


P( LL {o(x°, U-" Xn(x)) > s}) <6. 


n>Nn§ 


524 17 Stochastic Recursive Sequences 


Theorem 17.4.1 is proved. 


Example 17.4.1 (Generalised autoregression) Let V = R. A generalised autoregres- 
sion process is defined by the relations 


Xn+1 = G(tn F(Xn) + Mn); (17.4.9) 


where F and G are functions mapping Rt R and &, = (én, mn) is a stationary 
ergodic driving sequence, so that {X,,} is an s.r.s. with the function 


fa, y=G(y,F@)+y), y=Or,2) €V=R’. 


If the functions F and G are nondecreasing and left continuous, G(x) > 0 for all 
x ER, and the elements ¢, are nonnegative, then the process (17.4.9) satisfies the 
condition of Theorem 17.3.1, and therefore U~"*' X,, (0) + X° with probability 1 (as 
n — oo). To establish convergence to a proper stationary sequence X*, one has to 
prove uniform boundedness in probability (in 1) of the sequence X,, (0) (see below). 

Now we will establish under what conditions the sequence (17.4.9) will satisfy 
the conditions of Theorem 17.4.1. Suppose that the functions F and G satisfy the 
Lipschitz condition: 


|G) — G(x2)| < cela — xl, | F (x1) — F(x2)| < er|x1 — x2. 
Then 
| f (x1, &0) — f (2, €0)| < ce |o0(F @) — F(a2))| < crce|Eollx1 — x2). (17.4.10) 


Theorem 17.4.2 Under the above assumptions, the sequence (17.4.9) will satisfy 
condition (C) if 
Incgcr + Eln|{o| < 0. (17.4.11) 


The sequence (17.4.9) will satisfy condition (B) if (7.4.11) holds and, moreover, 
E(In|nol)" <oe. (17.4.12) 


When (17.4.11) and (17.4.12) hold, the sequence (17.4.9) has a stationary majorant, 
i.e. there exists a stationary sequence M, (depending on XQ) such that |Xn| < My 
for alin. 


Proof That condition (C) for p(x1, x2) = |x, — x2| follows from (17.4.10) is obvi- 
ous. We prove (B). To do this, we will construct a stationary majorant for |X,|. One 
could do this using Theorems 17.2.2 and 17.2.2A. In our case, it is simpler to prove 
it directly, making use of the inequalities 


|G(x)| < |G()| + cglxl, |F(x)| <|F@|+crlxl, 


17.4 Ergodicity Conditions 525 
where we assume, for simplicity’s sake, that G(O) and F (0) are finite. Then 
[Xn+1] <|GO)| + calénl -|F(Xn)| + call 
<|G()| + cocrlén|-|Xnl + celgnl - |FO)| + colnnl = Bn|X()| + yn, 
where 


Bn :=cecr|én| = 9, Yn = |G(0)| +celgnl- | F (0)| + ce|nn| 


ElnB, <0, E(ny,)* <oo. 
From this we get that, for Xp = x, 


n 


n n-1 
IXn4il < Ix] [Bi + > I] Bi) + Yn, 
j=0 1=0 


j=n-l 


(17.4.13) 


0 love) 0 
U" |Xn4il < |x| I] Bj + »( I] pi ror + yo. 


j=n-n 1=0 \ j=-l 
Put 
0 
Qj = In B;, SI = = aj. 
j=-l 


By the strong law of large numbers, there are only finitely many positive values 
S; — al, where 2a = Ea; < 0. Therefore, for all / except for those with S; — al > 0, 


0 
I] Bj < a, 
j=-l 


On the other hand, y_;—1 exceeds the level / only finitely often. This means that the 
series in (17.4.13) (denote it by R) converges with probability 1. Moreover, 


S = sup Sz > Sp 
k>0 


is a proper random variable. As result, we obtain that, for all n, 
U"|Xnsil < |xle’ +R + y0, 


where all the terms on the right-hand side are proper random variables. The required 
majorant 


My, := uP (iele” +R+ yo) 


is constructed. This implies that (B) is met. The theorem is proved. 


526 17 Stochastic Recursive Sequences 


The assertion of Theorem 17.4.2 can be extended to the multivariable case 
X = R?4,d > 1, as well (see [6]). 

Note that conditions (17.4.11) and (17.4.12) are, in a certain sense, necessary not 
only for convergence U~"** X,,(x) > X°%, but also for the boundedness of X,(x) 
(or of X°) only. This fact can be best illustrated in the case when F(t) = G(t) =t. 
In that case, U~" Xn4541(x) and X* +! admit explicit representations 


Ss n+s S$ 
U" Xn4s+1 (0) =x I] oj + » I] fjNs-1-1 + Ns, 
j=-n 1=0 j=s—l 
co. US 
a > I] EjNs—-1-1 + Ns. 


1=0 j=s—l 


Assume that Eln¢é > 0, 7 = 1, and put 


0 
5:=0, zj:=lng;, Z| = > Zje 
j=l 


Then 
[o.@) CO 
X'a=1+) oe, where Y “IZ; = 0) =00 
1=0 1=0 


with probability 1, and consequently X! = 00 and X,, > 00 with probability 1. 
If E[Inn]t = 0c and ¢ = b < | then 


[o.@) 


X'=not+ bY“ exp{y-1 +/1\nbd}, 
1=0 


where y; = 1n7;; the event {y_;-1 > —/1nb} occurs infinitely often with probabil- 
ity 1. This means that X' = 00 and X, — oo with probability 1. 


Chapter 18 
Continuous Time Random Processes 


Abstract This chapter presents elements of the general theory of continuous time 
processes. Section 18.1 introduces the key concepts of random processes, sample 
paths, cylinder sets and finite-dimensional distributions, the spaces of continuous 
functions and functions without discontinuities of the second kind, and equivalence 
of random processes. Section 18.2 presents the fundamental results on regularity 
of processes: Kolmogorov’s theorem on existence of a continuous modification and 
Kolmogorov—Chentsov’s theorem on existence of an equivalent process with trajec- 
tories without discontinuities of the second kind. The section also contains discus- 
sions of the notions of separability, stochastic continuity and continuity in mean. 


18.1 General Definitions 


Definition 18.1.1 A random process! is a family of random variables &(t) = &(t, @) 
given on a common probability space (2, ¥, P) and depending on a parameter t 
taking values in some set T. 


A random process will be written as {&(t), t € T}. 

The sequences of random variables &,, &2,... considered in the previous sec- 
tions are random processes for which T = {1,2,3,...}. The same is true of the 
sums $1, S2,... of &, &,... Markov chains {X,, n =0,1,...}, martingales {X,; 
n € N}, stationary and stochastic recursive sequences described in previous chapters 
are also random processes. The processes for which the set T can be identified with 
the whole sequence {...,—1,0,1,...} or a part thereof are usually called random 
processes in discrete time, or random sequences. 

If T coincides with a certain real interval T = [a, b] (this may be the whole real 
line —oo < t < o& or the half-line t > 0), then the collection {&(t), t € T} is said to 
be a process in continuous time. 

Simple examples of such objects are renewal processes {n(t), t > 0} described 
in Chap. 10. 


‘As well as the term “random process” one also often uses the terms “stochastic” or “probabilistic” 
processes. 


A.A. Borovkov, Probability Theory, Universitext, 527 
DOI 10.1007/978-1-4471-5201-9_18, © Springer-Verlag London 2013 


528 18 Continuous Time Random Processes 


In the present chapter we will be considering continuous time processes only. 
Interpretation of the parameter ¢ as time is, of course, not imperative. It appeared 
historically because in most problems from the natural sciences which led to the 
concept of random process the parameter ¢ had the meaning of time, and the value 
&(t) was what one would observe at time f¢. 

The movement of a gas molecule as time passes, the storage level in a water 
reservoir, oscillations of an airplane’s wing etc could be viewed as examples of real 
world random processes. 

The random function 


é() = yo 2 sinkt, t €[0,2z], 
k=1 


where the &% are independent and identically distributed, is also an example of a 
random process. 

Consider a random process {&(t), t € T}. If m € £2 is fixed, we obtain a func- 
tion &(t), t € T, which is often called a sample function, trajectory or path of the 
process. Thus, the random values here are functions. As before, we could consider 
here a sample probability space, which can be constructed for example as follows. 
Consider the space X of functions x(t), t € T, to which the trajectories &(t) belong. 
Let, further, BF be the o-algebra of subsets of X generated by the sets of the form 


C= [x eX:x() € Bi,...,4(tn) € Ba} (18.1.1) 


for any n, any t),...,¢, from T, and any Borel sets By,..., B,. Sets of this form 
are called cylinders; various finite unions of cylinder sets form an algebra generat- 
ing Be. If a process &(t, @) is given, it defines a measurable mapping of (£2, §) 
into (X, BT) since clearly €~!(C) = {w: €(-, w) € C} € § for any cylinder C, and 
therefore €~'(B) € § for any Be Dire This mapping induces a distribution P; on 
(X, 84-) defined by the equalities Pg(B) = P(E~'(B)). The triplet (X, BY, Pe) is 
called the sample probability space. In that space, an elementary outcome o is 
identified with the trajectory of the process, and the measure Pz is said to be the 
distribution of the process &. 

Now if, considering the process {&(t)}, we fix the time epochs f}, f2,..., fn, we 
will get a multi-dimensional random variable (&(t,, @),...,&(t, @)). The distri- 
butions of such variables are said to be the finite-dimensional distributions of the 
process. 

The following function spaces are most often considered as spaces X in the the- 
ory of random processes with continual sets T. 

1. The set of all functions on T: 


X=R? =[[R. 
teT 


where R, are copies of the real line (—00, 00). This space is usually considered with 
the o-algebra ue of subsets of R’ generated by cylinders. 


18.1 General Definitions 529 


2. The space C(T) of continuous functions on T (we will write C(a,b) if 
T = [a, b]). In this space, along with the o-algebra Br generated by cylinder sub- 
sets of C(T) (this o-algebra is smaller that the similar o-algebra in R’), one also 
often considers the o-algebra Scvr) (the Borel o-algebra) generated by the sets 
open with respect to the uniform distance 


p(x, y) = suply(t)—x()|, x,y € C(Z). 
teT 
It turns out that, in the space C(T), we always have Scr) = boys (see, e.g., [14]). 

3. The space D(T) of functions having left and right limits x(t — 0) and x(t + 0) 
at each point f, the value x(t) being equal either to x(t — 0) or to x(@# + 0). If 
T =[a, D], it is also assumed that x(a) = x(a + 0) and x(b) = x(b — 0). This space 
is often called the space of functions without discontinuities of the second kind. The 
space of functions for which at all other points x(t) = x(t — 0) (x(b) = x(t + 0)) 
will be denoted by D_(T) (D(T)). The space D+ (T) (D_(T)) will be called the 
space of right-continuous (left-continuous) functions. For example, the trajectories 
of the renewal processes discussed in Chap. 10 belong to D+(0, 00). 

In the space D(T) one can also construct the Borel o-algebra with respect to 
an appropriate metric, but we will restrict ourselves to using the o-algebra Dae 
of cylindric subsets of D(T). 

Now we can formulate the following equivalent definition of a random process. 
Let X be a given function space, and 6 be the o-algebra of its subsets containing 
the o-algebra BF of cylinders. 


Definition 18.1.2 A random process &(t) = &(t, w) is a measurable (in w) mapping 
of (2,8, P) into (X, 6, Pz) (to each w one puts into correspondence a function 
&(t) = &(t, w) so that é-1(G) = {w:&(-) € G} € § for G € G). The distribution P; 
is said to be the distribution of the process. 


The condition BL Cc & is needed to ensure that the probabilities of cylinder 
sets and, in particular, the probabilities P(é(t) € B), Be pare are correctly defined, 
which means that &(f) are random variables. 

So far we have tacitly assumed that the process is given and it is known that 
its trajectories lie in X. However, this is rarely the case. More often one tries to 
describe the process &(f) in terms of some characteristics of its distribution. One 
could, for example, specify the finite-dimensional distributions of the process. From 
Kolmogorov’s theorem on consistent distributions® (see Appendix 2), it follows that 


7A discontinuity of the second kind is associated with either non-fading oscillations of increasing 
frequency or escape to infinity. 

3Recall the definition of consistent distributions. Let R,, t € T, be real lines and %, the o-algebras 
of Borel subsets of R;. Let J, = {t},...,t,} be a finite subset of T. The finite-dimensional dis- 
tribution of (&(t,@),...,&(tn,@)) is the distribution Pz, on (R™, 8"), where R™ =T],c7, Ry 
and 87" = Tlrer, Br. Let two finite subsets T’ and T” of T be given, and (R’, 8’) and (R’, 8”) 
be the respective subspaces of (R?, 8"). The distributions P7: and Pr» on (R’, 8’) and (R”, 8”) 


530 18 Continuous Time Random Processes 


finite-dimensional distributions uniquely specify the distribution P; of the process 
on the space (R’, poe That theorem can be considered as the existence theorem 
for random processes in (R’ , Br) with prescribed finite-dimensional distributions. 

The space (R7, Br) is, however, not quite convenient for studying random pro- 
cesses. The fact is that by no means all relations frequently used in analysis gener- 
ate events, i.e. the sets which belong to the o-algebra Dard and whose probabilities 
are defined. Based on the definition, we can be sure that only the elements of the 
o-algebra generated by {&(t) € B}, t ¢ T, B being Borel sets, are events. The set 
{sup,-7 &(t) < c}, for instance, does not have to be an event, for we only know its 
representation in the form her lé (t) < c}, which is the intersection of an uncount- 
able collection of measurable sets when T is an interval on the real line. 

Another inconvenience occurs as well: the distribution Pz on (R’, BF) does not 
uniquely specify the properties of the trajectories of &(t). The reason is that the 
space R? is very rich, and if we know that x(-) belongs to a set of the form (18.1.1), 
this gives us no information about the behaviour of x(t) at points ¢ different from 
ti,...,t,. The same is true of arbitrary sets A from Br: roughly speaking, the 
relation x(-) € A can determine the values of x(t) at most at a countable set of 
points. We will see below that even such a set as {x(t) = 0} does not belong to BF. 
To specify the behaviour of the entire trajectory of the process, it is not sufficient to 
give a distribution on %/—one has to extend this o-algebra. 

Prior to presenting the respective example, we will give the following definition. 


Definition 18.1.3 Processes &(t) and n(f) are said to be equivalent (or stochasti- 
cally equivalent) if P(&(t) = n(t)) = 1 for all t € T. In this case the process n is 
called a modification of &. 


Finite-dimensional distributions of equivalent process clearly coincide, and 
therefore the distributions P; and P,, on (R, 7) coincide, too. 


Example 18.1.1 Put 


0 ift : 
vat = {4 ae 


and complete BF with the elements x,(t), a € [0, 1], and the element x(t) = 0. 
Let y € Uo,1. Consider two random processes g(t) and &1(t) defined as follows: 
£o(t) = x(t), &1(t) = x, (0). Then clearly 


P(Eo(t) = 610) =P(y 1) = 1, 


the processes &) and € are equivalent, and hence their distributions on (R’, BF) 
coincide. However, we see that the trajectories of the processes are substantially 
different. 


are said to be consistent if their projections on the common part of subspaces R’ and R” (if it 
exists) coincide. 


18.1 General Definitions 531 


It is easy to see from the above example that the set of all continuous functions 
C(T), the set {sup;ero,1]*(t) < x}, the one-point set {x(t) = 0} and many others 
do not belong to Be. Indeed, if we assume the contrary—say, that C(T) € Br 
then we would get from the equivalence of &) and & that P(é&y € C(O, 1)) = P(é; € 
C(0, 1)), while the former of these probabilities is 1 and the latter is 0. 

The simplest way of overcoming the above difficulties and inconveniences is to 
define the processes in the spaces C(T) or D(T) when it is possible. If, for example, 
&(t) € C(T) and n(t) € C(T), and they are equivalent, then the trajectories of the 
processes will completely coincide with probability 1, since in that case 


() © =nO} = EO =n} = {EO =n for allt eT}, 


rational ¢ teT 


where the probability of the event on the left-hand side is defined (this is the prob- 
ability of the intersection of a countable collection of sets) and equals 1. Similarly, 
the probabilities, say, of the events 


[supé(r) < c| =(|lE@ <e} 
teT 1eT 
are also defined. 

The same argument holds for the spaces D(T), because each element x(-) of D 
is uniquely determined by its values x(t) on a countable everywhere dense set of t 
values (for example, on the set of rationals). 

Now assume that we have somehow established that the original process &(t) (let 
it be given on (R7, %2)) has a continuous modification, i.e. an equivalent process 
n(t) such that its trajectories are continuous with probability 1 (or belong to the 
space D(T)). The above means, first of all, that we have somehow extended the 
o-algebra Bf —adding, say, the set C(T.)—and now consider the distribution of & 
on the o-algebra BT — o (BZ, C(T)) (otherwise the above would not make sense). 


But the extension of the distribution of € from (R’, BF) to (R’, 87) may not be 
unique. (We saw this in Example 18.1.1; the extension can be given by, say, putting 
P(é € C(T)) = 0.) What we said above about the process 7 means that there exists 
an extension P,, such that P,(C(T)) =P(y € C(T)) = 1. 

Further, it is often better not to deal with the inconvenient space (R7 , Br) at all. 
To avoid it, one can define the distribution of the process 7 on the restricted space 
(C(T), iF). It is clear that 


62 cB? =o(82,cc)), 82=8? ncC(T) 

(the former o-algebra is generated by sets of the form (18.1.1) intersected with 
C(T)). Therefore, considering the distribution of 7 concentrated on C(T), we can 
deal with the restriction of the space (R? , 87) to (C(T), BT) and define the proba- 
bility on the latter as P,(A) = P(n € A), AE Be CB! . Thus we have constructed 
a process n with continuous trajectories which is equivalent to the original process 
é (if we consider their distributions in (R’, 4 )). 

To realise this construction, one has now to learn how to find from the distribution 
of a process € whether it has a continuous modification 7 or not. 


532 18 Continuous Time Random Processes 


Before stating and proving the respective theorems, note once again that the 
above-mentioned difficulties are mainly of a mathematical character, i.e. related 
to the mathematical model of the random process. In real life problems, it is usually 
clear in advance whether the process under consideration is continuous or not. If it 
is “physically” continuous, and we want to construct an adequate model, then, of 
course, of all modifications of the process we have to take the continuous one. 

The same argument remains valid if, instead of continuous trajectories, one con- 
siders trajectories from D(T). The problem essentially remains the same: the diffi- 
culties are eliminated if one can describe the entire trajectory of the process &(-) by 
the values €(f) on some countable set of t values. Processes possessing this property 
will be called regular. 


18.2 Criteria of Regularity of Processes 


First we will find conditions under which a process has a continuous modification. 
Without loss of generality, we will assume that T is the segment T = [0, 1]. 

A very simple criterion for the existence of a continuous modification is based 
on the knowledge of two-dimensional distributions of &(f) only. 


Theorem 18.2.1 (Kolmogorov) Let &(t) be a random process given on (R', Bh) 
with T = [0, 1]. If there exista > 0,b > 0 and c < © such that, for allt andt +h 
from the segment [0, 1], 


Elé(¢ +h) —E()|" <cla!*?, (18.2.1) 
then &(-) has a continuous modification. 

We will obtain this assertion as a consequence of a more general theorem, of 
which the conditions are somewhat more difficult to comprehend, but have essen- 
tially the same meaning as (18.2.1). 

Theorem 18.2.2 Let for all t,t +h € [0, 1], 
P(|E(¢ +h) — €)| > eh) < ah), 


where €(h) and q(h) are decreasing even functions of h such that 


5 6(2-") <0, yao" <0O. 


n=1 
Then &(-) has a continuous modification. 
Proof We will make use of approximations of &(t) by continuous processes. Put 
ipa. Pr S0, Ty. 2, 
Ent) = Eta) + 20 = tar LE Cnr-+1) = E(tn,r)| for t € [tyr, tnr+1)- 


18.2 Criteria of Regularity of Processes 533 


Fig. 18.1 Illustration to the 


En+i(l) 
proof of Theorem 18.2.2: St (t) 
construction of piece-wise 
linear approximations to the E(t) : 
process &(ft) i 
| 
i 


t 
nt12rt] 
“« 


t 


1 
1 
1 
1 
1 
1 
1 
1 
4. 
fig ba 


1 DF Cn bre? 


From Fig. 18.1 we see that 


1 1 
|En41 (0) =p (t)| = E(tn+1,2r+1) ~ =[& (tn4.1,2r) + E(tn412r42) | <x(@+8), 
2 2 


where @ := |& (tn+1,2r+1) — § (n41,2r) |, B= |E Gnti,2r+)) — § Gn4+1,2r+2)|- This im- 
plies that 


1 
Zn i= max [Seti — S| < 5@ +B), 


té€ltnrstayrt 
P(Z, > €(2-")) < P(w > e(27")) + P(B > e(2-")) < 2g(2") 


(note that since the trajectories of &,(t) are continuous, {Z, > e(2~")} € $87 which 
is not the case in the general situation). Since here we have altogether 2” segments 
of the form [t,+, ta,r+1],7=0,1,...,2” — 1, one has 


—n n+l —n 
P( max |Enei() — &u(8)| > e(2")) < 2142"). 


Because pSeear 2"q(2—") < 00, by the Borel—Cantelli criterion, for almost all @ (1.e. 
for w € A, P(A) = 1), there exists an n(q@) such that, for all n > n(@), 


max |én+1 0) — En(t)| = p(En41,€n) < a(2°*). 


From this it follows that &, is a Cauchy sequence a.s., since 
CO 
p(En, §m) Sen = > 62) > 0 
n 
as n —> oo for all m > n, w € A. Therefore, for w € A, there exists the limit 
n(t) = limMp—+oo En(t), and |&,) (4) — n(t)| < én, so that convergence &,(t) > n(f) is 
uniform. Together with continuity of &,(t) this implies that 7(t) is also continuous 
(this argument actually shows that the space C(0, 1) is complete). 
It remains to verify that € and n are equivalent. For t = f,,, one has &,+4,(t) = 
&(t) for all k => 0, so that n(t) = &(t). If t # t,,, for all n and r, then there exists a 
sequence r,, such that t;, ,, > t andO<t—t,,, <2~", and hence 


P(E(t.m) —EO)| > € = thn)) S(t = try), 
P([E(t,7,) —€@| > €2)) <q(2). 
By the Borel—Cantelli criterion this means that &,,, — € with probability 1. At 
the same time, by virtue of the continuity of n(t) one has n(t;,,,,) > n(t). Because 


E(t.r,) =n(tr,), we have €(t) = n(t) with probability 1. 
The theorem is proved. 


534 18 Continuous Time Random Processes 


Corollary 18.2.1 /f 
Elé(t +h) — &(1)|* < oo (18.2.2) 
~ |log |Aj|t+? a 


for some b>a>0Oandc < ™, then the conditions of Theorem 18.2.2 are satisfied 
and hence &(t) has a continuous modification. 


Condition (18.2.2) will certainly be satisfied if (18.2.1) holds, so that Kol- 
mogorov’s theorem is a consequence of Theorem 18.2.2. 


Proof of Corollary 18.2.1 Put ¢(h) := | log, |h||7*, 1 < B < b/a. Then 


[ee 


[o,@) 
y\e(27") = eae <0, 
n=l n=1 
and from Chebyshev’s inequality we have 
h _ h 
P(|E@ +4) (| > 0) < Ee = =. ait, 


~ [logs |Al|!+? [logs |h||1+8 


where 6 = b — af > 0. It remains to note that 


[o,@) [o,@) 
¥-2"g(2-") = Slog, 2°"? < . 
n=1 


n=1 


The corollary is proved. 


The criterion for &(t) to have a modification belonging to the space D(T) is more 
complicated to formulate and prove, and is related to weaker conditions imposed on 
the process. We confine ourselves here to simply stating the following assertion. 


Theorem 18.2.3 (Kolmogorov—Chentsov) /f, for some a > 0, B => 0, b > 0, and all 
Elé(t) —E(@ — hy) |" |E@ +42) -— (P| <ch'*?, h=hyt+h2, (18.2.3) 


then there exists a modification of &(t) in D(O, 1B ie 


Condition (18.2.3) admits the following extension: 
P([E(t + ho) — (0)| - JE) —E(— A) | > e(h)) < qth), (18.2.4) 


where €(h) and g(h) have the same meaning as in Theorem 18.2.2. Under condition 
(18.2.4) the assertion of the theorem remains valid. 

The following two examples illustrate, to a certain extent, the character of the 
conditions of Theorems 18.2.1—18.2.3. 


“For more details, see, e.g., [9]. 


18.2 Criteria of Regularity of Processes 535 


Example 18.2.1 Assume that a random process &(t) has the form 


E(t) = D> Exgx(t), 


k=1 
where g(t) satisfy the Hélder condition 
oe (t +h) — ge(t)| <clhl®, 


a > 0, and (&,...,&,-) is an arbitrary random vector such that all E|&;|! are finite 
for some / > 1/a. Then the process &(t) (which is clearly continuous) satisfies con- 
dition (18.2.1). Indeed, 


; 
Elg¢ +h) —E@)|' <cr DE léellel|a[! <ea|h™ |, al > 1. 
k=1 


Example 18.2.2 Let y € Uo1, &(t) = 0 for t < y, and &(t) = 1 fort > y. Then 
i 
ElE(¢ +h) —E()| =P(y € (t,t +h) =h 


for any / > 0. Here condition (18.2.1) is not satisfied, although |& (f +h) — &(t)| 56 
as h — 0. Condition (18.2.3) is clearly met, for 


E\&(t) — E(t — hy)| - |E@ + ho) —E| =O. (18.2.5) 


We will get similar results if we take & (t) to be the renewal process for a sequence 
V1, Y2,---, where the distribution of y; has a density. In that case, instead of (18.2.5) 
one will obtain the relation 


Elé(t) — E(¢ — hy) - [E(@ +42) — E(| S chih2 < ch’. 


In the general case, when we do not have data for constructing modifications 
of the process & in the spaces C(T) or D(T), one can overcome the difficulties 
mentioned in Sect. 18.1 with the help of the notion of separability. 


Definition 18.2.1 A process &(f) is said to be separable if there exists a countable 
set S which is everywhere dense in T and 


P( Jim supé(u) > €(t) > liminf &(w) for all t € T) =1. (18.2.6) 
ust Coe 
ueS u 


This is equivalent to the property that, for any interval J CT, 


P( a, StH) —sup6 te) pf 5) = inf) = 1 


uelns u 


It is known (Doob’s theorem’) that any random process has a separable modifi- 
cation. 


5See [14, 26]. 


536 18 Continuous Time Random Processes 


Constructing a separable modification of a process, as well as constructing mod- 
ifications in spaces C(T) and D(T), means extending the o-algebra bora to which 
one adds uncountable intersections of the form 


A=(\{E@ € la, b]} = {supgw) <b, int ew) >a}, 
uel uel ae 


and extending the measure P to the extended o -algebra using the equalities 


P(A) = P/ (| &@ €la, ni). 


uelNns 


where in the probability on the right-hand side we already have an element of BF. 

For separable processes, such sets as the set of all nondecreasing functions, the 
sets C(T), D(T) and so on, are events. Processes from C(T) or D(T) are automat- 
ically separable. And vice versa, if a process is separable and admits a continuous 
modification (modification from D(T)) then it will be continuous (belong to D(T)) 
itself. Indeed, if 7 is a continuous modification of € then 


P(E(t) = n(t) forallt € S) = 1; 
From this and (18.2.6) we obtain 


P(limsup n(u) > E(t) = lim inf n(u) for allt € T) =]; 
ee ues 


Since lim sup,_,, 7(u) = liminf,-,; 7(u) = n(t), one has 
P(E(t) = n(t) forallté€ T) = 1. 


In Example 18.1.1, the process &)(t) is clearly not separable. The process &o(t) 
is a separable modification of & (t). 

As well as pathwise continuity, there is one more way of characterising the con- 
tinuity of a random process. 


Definition 18.2.2 A random process &(t) is said to be stochastically continuous if, 
forallte T,ash— 0, 


&(t th) 5 &(t) (P(\E@ +h) — &(t)| > €) > 0). 


Here we deal with the two-dimensional distributions of &(t) only. 

It is clear that all processes with continuous trajectories are stochastically con- 
tinuous. But not only them. The discontinuous processes from Examples 18.1.1 
and 18.2.2 are also stochastically continuous. A discontinuous process is not 
stochastically continuous if, for a (random) discontinuity point t (&(t + 0) 4 
&(t — 0)), the probability P(t = to) is positive for some fixed point fo. 


Definition 18.2.3 A process &(f) is said to be continuous in mean of order r (in 
mean when r = 1; in mean quadratic when r = 2) if, for allt € T, ash > 0, 


é(t +h) mes E(t) or, whichis the same, E|&(t +h) — &(t)|’ > 0. 


18.2 Criteria of Regularity of Processes 537 


The discontinuous process &(t) from Example 18.2.2 is continuous in mean of 
any order. Therefore the continuity in mean and stochastic continuity do not say 
much about the pathwise properties (they only say that a jump in a neighbour- 
hood of any fixed point f is unlikely). As Kolmogorov’s theorem shows, in or- 
der to characterise the properties of trajectories, one needs quantitative bounds for 
Elé(t + h) — &(t)|" or for P(E (t + h) — E(t)| > €). 

Continuity theorems for moments imply that, for a stochastically continuous pro- 
cess &(t) and any continuous bounded function g(x), the function Eg(&(t)) is con- 
tinuous. This assertion remains valid if we replace the boundedness of g(x) with the 
condition that 


sup E|g(é(r)) |* <oo forsomea > 1. 
t 


The consequent Chaps. 19, 21 and 22 will be devoted to studying random pro- 
cesses which can be given by specifying the explicit form of their finite-dimensional 
distributions. To this class belong: 


1. Processes with independent increments. 
2. Markov processes. 
3. Gaussian processes. 


In Chap. 22 we will also consider some problems of the theory of processes with 
finite second moments. Chapter 20 contains limit theorems for random processes 
generated by partial sums of independent random variables. 


Chapter 19 
Processes with Independent Increments 


Abstract Section 19.1 introduces the fundamental concept of infinitely divisible 
distributions and contains the key theorem on relationship of such processes to 
processes with independent homogeneous increments. Section 19.2 begins with a 
definition of the Wiener process based on its finite-dimensional distributions and 
establishes existence of a continuous modification of the process. It also derives the 
distribution of the maximum of the Wiener process on a finite interval. The Laws 
of the Iterated Logarithm for the Wiener process are established in Sect. 19.3. Sec- 
tion 19.4 is devoted to the Poisson processes, while Sect. 19.5 presents a character- 
isation of the class of processes with independent increments (the Lévy—Khintchin 
theorem). 


19.1 General Properties 


Definition 19.1.1 A process {&(t), t € [a, b]} given on the interval [a, b] is said 
to be a process with independent increments if, for any n and tg < t) <--- <t, 
a < to, th < b, the random variables &(fo), E(t1) — E(to),...,& (tr) — E(tm_-1) are 
independent. 


A process with independent increments is called homogeneous if the distribu- 
tion of &(t;) — &(to) is determined by the length of the interval t; — fo only and is 
independent of fo. 

In what follows, we will everywhere assume for simplicity’s sake that a = 0, 
&(0) =O andb=1 orb=@. 


Definition 19.1.2 The distribution of a random variable é is called infinitely di- 
visible (cf. Sect. 8.8) if, for any n, the variable € can be represented as a sum of 
independent identically distributed random variables: € = €),, + ---+ &n.n- If g(A) 
is the ch.f. of &, then this is equivalent to the property that g!/” is a ch.f. for any n. 


It is clear from the above definitions that, for a homogeneous process with 
independent increments, the distribution of &(t) is infinitely divisible, because 
&§&=E , +--+ +Enn, where & » = &(kt/n) — E((k — 1)t/n) are independent and 
distributed as &(t/n). 


A.A. Borovkov, Probability Theory, Universitext, 539 
DOI 10.1007/978-1-4471-5201-9_19, © Springer-Verlag London 2013 


540 19 Processes with Independent Increments 


Theorem 19.1.1 


(1) Let {&(t), t > 0} be a stochastically continuous homogeneous process with in- 
dependent increments, and let @;(A) = Ee’ be the ch. of E(t), @(A) := 
gy (A). Then 


oi (A) = 9" (A), (19.1.1) 
g(a) £0 for any x. 


(2) Let p(A) be the chff. of an infinitely divisible distribution. Then there exists a 
random process {&(t), t > 0} satisfying the conditions of (1) and such that 


Ee'** = g(a). 


Note that in the theorem the power g! (A) of the complex number g(A) is under- 
stood as |g(A)|'e!“", where a(A) = arg p(A) (p(A) = |p(A) lel). But a (A) is a 
multi-valued function, which is defined up to the term 277k with integer k. There- 
fore, for non-integer f, the function g'(A) will be multi-valued as well. Since any 
ch.f. is continuous, after crossing the level 277k by a(k) (while changing the value of 
A from zero, a(0) = 0), we are to take the “nearest” branch of a(k) so as to ensure 
continuity of the function yg‘ (A). For example, for the degenerate distribution I; we 
have (A) = e!* (a(A) =A), so for small t > 0, ¢ > 0 and for 4 = 27 + € we are to 
set y' (A) = e! C7 +®)! rather than gy! (A) = e!®! (although g(A) = e!®). 

Denote by £ the class of ch-f.s of all infinitely divisible distributions and by 
£, the class of the ch.f.s of the distributions of &(t) for stochastically continuous 
homogeneous processes with independent increments. Then it follows from Theo- 
rem 19.1.1 that £ = £;. The class £ will be characterised in Sect. 19.5. 


Proof (1) Let &(t) satisfy the conditions of part (1) of the theorem. Then &(t) can 
be represented as a sum of independent increments 


n 


EQ) => (EG) -—EGj-D], 0 =0, m=t, tj > tj-1. 


j=l 


From this it follows, in particular, that for t; = j/n, t= 1, 


gA)=[G1n™]", Gin A=o'/"(). 


Raising both sides of the last equality to an integer power k, we obtain that, for any 
rational r = k/n, one has 


kn (a) = g/" (A), 


which proves (19.1.1) for t = 7. Now let ¢ be irrational and r, := [tn] /n. Since &(t) 


is a stochastically continuous process, one has & (rp) as &(t) as n > oo, and hence 
the corresponding ch.f.s converge: for any i, 


Prn A) > MA). 
But @,, (A) = g'" (A) > g' (A). Therefore (19.1.1) necessarily holds true. 


19.1 General Properties 541 


Further, by stochastic continuity of &(-), we have g,(A) = g'(A) > last 0 
for any 4. This implies that g(A) 4 0 for any A. This completes the proof of the first 
assertion of the theorem. 

(2) Observe first that if g € £ then, for any t > 0, gy’ is again a ch.f. Indeed, 


t _ yy [tn]/n 
g A) = lim pr’), 


so that y’(A) is a limit of ch.f.s which is continuous at the point 4 = 0. By the 
continuity theorem for ch.f.s, this is again a ch.f. 

Now we will construct a random process &(t) with independent increments by 
specifying its finite-dimensional distributions. Put 


O=t0 <t) <---<t, Aj := &(t;) — €(tj-1), 6; :=tj —tj-1, 


and observe that 
k k 


k k J 
SAE) = yoay yA = > Aj pier: 
j=l l=] j=1 i=j 


j=l of 


Define the ch.f. of the joint distribution of &(t,),...,&(t,) by the equality (postulat- 
ing independence of A ;) 


k k j k k 3j 
reo ase] = reo bs Aj >| — I] (4) : 

1 j=1 l=j j=l i=j 
Thus, we have used @ to define the finite-dimensional distributions of &(t) in 
(R7, 83) with T = [0,0©) which, as one can easily see, are consistent. By 
Kolmogorov’s theorem, there exists a distribution of a random process &(t) in 
(RT, Deree That process is by definition a homogeneous processes with indepen- 
dent increments. 

To prove stochastic continuity of &(f), note that, as h > 0, 


Fett EG+)-§) — g" (A) > go(A), 


where 


_[1 ifga) 40, 
w= {4 if g(a) =0. 


Thus the limiting function gg(A) can assume only two values: 0 and 1. But it is 
bound to be a ch-f. since it is continuous at the point 4 = 0 (g(A) £ 0 in a neigh- 
bourhood of the point A = 0) and is a limit of ch.f.s. Therefore go(A) is continuous, 
go(A) = 1, p"(A) > 1, and 


E(t th)—&(t) 50 ash. 


The theorem is proved. 


Corollary 19.1.1 Let the conditions of part (1) of Theorem 19.1.1 be met. If, for 
all t, E\E(t)| < co then 


Eé(t) = tEé(1). 


542 19 Processes with Independent Increments 


If E(E(1))? < 00 then 
Var &(t) = t Var é(1). 


Proof For the sake of brevity, put a := E&(1). Then, differentiating (19.1.1) in A at 
the point A = 0, we obtain 


Eé(t) = —i¢/ (0) = —itg’~''(0) =a, 
E£?(t) = —¢/ (0) = —t(t — 1p"? 0)(¢'))” — te"! )g" (0) 
=t(t — l)a? + tEE7(1), 
Var &(t) = t(Eé?(1) — a”) = 1 Varé(1). 


The corollary is proved. 
In the next theorem we put, as before, T = [0, 1] or T = [0, 00). 


Theorem 19.1.2 Homogeneous stochastically continuous processes with indepen- 
dent increments {&(t), t € T} have modifications in the space D(T), i.e. the process 
&(t) can be given in (D(T), BT) and hence have no discontinuities of the second 
type. 


Proof To simplify the argument, assume that EE*(1) exists, or, which is the same, 
that the second derivative g” (A) exists. Then 


E(E() — €(¢ —h))” = 9/0) =—hh — 1)[ yO]? — hg") <clhl, 


E(\E(¢ + h2) —€@)|"|E() — E(t — hy?) < Chih < 2h + hy), 


and the assertion follows from the second criterion of Theorem 18.2.3. The theorem 
is proved. 


In the general case, the proof is more complicated: one has to make use of crite- 
rion (18.2.4) and bounds for P(é(t) — €(t — h)| > €). 

Now we will consider the two most important processes with independent incre- 
ments: the so-called Wiener and Poisson processes. 


19.2 Wiener Processes. The Properties of Trajectories 


Definition 19.2.1 The Wiener process is a homogeneous process with independent 
increments for which the distribution of €(1) is normal. 


In other words, this is a process for which 


g(a) — eika—o7A?/2 gy (A) — yg! (A) = gikta—o7)t/2 


19.2 Wiener Processes. The Properties of Trajectories 543 


for some a and o* > 0. The second equality means that the increments &(t + u) — 
&(u) are normally distributed with parameters (at, ot). All joint distributions of 
&(t)),..., &(t,) are clearly also normal. 

The numbers a and o are called the shift and diffusion coefficients, respectively. 
Introducing the process &0(t) := (€(t) — at)/o which is obtained from &(t) by an 
affine transformation, we obtain that its ch.f. equals 


Keio — en iat/o (4/0) = eh. 


Such a process with parameters (0, f) is often called the standard Wiener process. 
We consider it in more detail. 


Theorem 19.2.1 The Wiener process has a continuous modification. 


This means, as we know, that the Wiener process {&(t), t € [0, 1]} can be consid- 


ered as given on the measurable space (C(O, 1), gi) of continuous functions. 


Proof We have &(t +h) — &(t) € ®o,n and h7!/2 (E(t +h) — E(t)) © ®o,. Therefore 
E(E@ +h) — &())* = WEE (1)4 = 37, 


This means that the conditions of Theorem 18.2.1 are satisfied. 


Thus we can assume that &(-) € C(O, 1). The standard Wiener process with con- 
tinuous trajectories will be denoted by {w(t), t € T}. 

Now note that the trajectories of the Wiener process w(t), being continuous, are 
not differentiable with probability | at any given point t. 

By virtue of the homogeneity of the process, it suffices to prove its nondifferen- 
tiability at the point 0. If, with a positive probability, i.e. on an event set A C 2 with 
P(A) > 0, there existed the derivative 

ian Oaein’!. 
t-0 ¢ 


then, on the same event, there would exist the limit 
w2 4!) we) _ 4 2wa th we) 
= li 
k->0o Q-k k->0o Q-k+1 k->oo Q-k 
= 2w’(0) — w’(0) = w’ (0). 


But this is impossible for the following reason. The independent differences 
w(27*+1) — w(2-*) have the same distribution as w(2~*), and with the positive 
probability p = 1 — @(1) they exceed the value /2-€. That is, the independent 
events By, = {w(2-*+!) —w(2-*) > J2-*} have the property 4 P(B;,) =~. 
By the Borel—Cantelli criterion, this means that with probability 1 there occur in- 
finitely many events Bz, so that 


a-k+1 _ Q-k 
P(limsup um ee) > i) =1. 
k-00 2-k 


544 19 Processes with Independent Increments 


In the same way we find that 


g-k+1 _ Q-k 
P(imint ee 1)=1. 


k->oo Q2-k 


This implies that, with probability 1, 


——_ w(2-t!) — waa’) _.  w(2 #1) — wa) 
lim sup = =O, lim inf = — ; 
k-> oo 2 k->0o0 2 


and therefore the process w(t) is nondifferentiable at any given point t with proba- 
bility 1. 

A stronger assertion also takes place: with probability 1 there exists no point t 
at which the trajectory of the process w(t) would have a derivative. In other words, 
the Wiener process is nowhere differentiable with probability 1. The proof of this 
fact is much more complicated and lies beyond the scope of the book. 

The reader can easily verify that w(t) has, in a certain sense, a parabola property. 
Namely, for any c > 0, the process w*(t) = c7!/?w(ct) is again a Wiener process. 

The properties of continuity of trajectories and independence of increments for 
the Wiener process allow us to find, in an explicit form, the distributions of 


w(t) = max w(u) 
ue[0,t] 
and of the time of the first passage of a given level which is defined, for a given 
x > 0, by 
n(x) = inf{t: w(t) > x} = inf{r: w(t) =x}. 


Theorem 19.2.2 


P(w(t) > x) =2P(w(t) > x) =2(1 -0(=)). (19.2.1) 


The distribution of n(\) is stable and has the density 
1 


1 
a a; 
J 2x 3/2 


Distribution (19.2.1) is sometimes called the double normal tail law, while the 
distribution with density (19.2.2) is called the Lévy distribution (see Sect. 8.8). 


‘0. (19.2.2) 


Proof Since 


{n(x) = v} = (\{wo - 1/n) <x, w(v) =x} E Sy -=o0{w(u); u< v} 


n=1 


and w(t) — w(v) = w(t — v) for t > v does not depend on §,, we have 
t 
P(w(t) > x) = P(n(x) E dv)P(w(t —v)> 0) 
0 


t 
= / P(n(x) € dv) = + P(wi(t) > x). 
2 Ih 2 


This implies the first assertion of the theorem. 


19.3. The Laws of the Iterated Logarithm 545 


The same equalities imply that 


P(n(x) < v) = P(W(v) > x) =2(1 o(=.)) = Fal eons 


which yields, for the density f;, of the variable 7 := (1), 


Poe 
fyr(v) = Jane 


In order to prove that this distribution is stable, note that 


n(n)=m +---+Mn, 


where n; are distributed as n and are independent (since the path of w(f) first attains 
level 1; then level 2, starting at a point with ordinate 1; then level 3, and so on). 
Using the same argument as above, we obtain that 


P(n(n) < v) =P(W(v) > n) = P(w(vn”) S1)=P(y< un’), 


so the distributions of 7 and n(n) coincide up to a scale transformation. This implies 


the stability of the distribution of 7 (see Sect. 8.8). Since 7 > 0 and P(n > x) ~ ,/ = 


TX 
as x — oo, we obtain that it is, up to a scale transformation, the distribution Fj 2, 


with parameters 6 = 1/2, p = 1 (cf. Sect. 8.8). The theorem is proved. 


19.3 The Laws of the Iterated Logarithm 


Using an argument similar to that employed at the end of the previous section, one 
can establish a much stronger assertion: the trajectory of w(t) in the neighbourhood 
of the point ¢ = 0, graphically speaking, “completely shades” the interior of the 
domain bounded by the two curves 


| 1 
y=t erin 


The exterior of this domain remains untouched. This is the so-called law of the 
iterated logarithm. 


Theorem 19.3.1 


546 19 Processes with Independent Increments 


Thus, if we consider the sequence of random variables w(t,), t, ) 0, then, for 


any é > 0, 
1 
( +e),/2t, InIn — 
th 


will be upper and lower sequences, respectively, for that sequence. 

For processes, we could introduce in a natural way the notions of upper and 
lower functions. If, for example, a process &(t) belongs to C(O, 00) or D(O, 00) (or 
is separable on (0, o0)), then the respective definition for the case t — oo has the 
following form. 


Definition 19.3.1 A function a(t) is said to be upper (lower) for the process & (ft) 
if, for some sequence tf, t oo, the events A, = {sup,., (€(¢) — a(t)) > O} occur 
finitely (infinitely) often with probability 1. 


Along with Theorem 19.3.1, we will obtain here the conventional law of the 
iterated logarithm. The proofs of the both assertions are essentially identical. We 
will prove the latter and derive the former as a consequence. 


Theorem 19.3.2 (The Law of the Iterated Logarithm) 


P(t my 1) 1 
im sup —————. = 1] = 1, 
ae J/2t InInt 


P(i xp UY) 1) 1 
1m1nt MM = — =); 
t>oo ./2tInInt 


Thus, for any ¢ > 0, the functions (1 + ¢)/2rInIn¢ are, respectively, upper and 
lower for w(t) as t > co. 


Proof of Theorem 19.3.2 First observe that, by L’Hospital’s rule, 


P(w(t) > x) = : ne du 
7 J/20t x 


1 oe t ; 
= — | ee ay ME tint (19.3.1) 
V20t Jx/ Jt V 20x 


as x/4/t — oo. 


Let a > 1 and xx := V2a* InIna*. We have to show that, for any ¢ > 0, 


w(t) 
P| lim sup —— <1 +s) ="); (19.3.2) 
( a J 2t InInt 


i.e. that, with probability 1, for all sufficiently large r, 


w(t) < A+e)v2rInIn¢. 


19.3 The Laws of the Iterated Logarithm 547 


Fig. 19.1 Illustration to the y 
proof of Theorem 19.3.2: 
replacing the curvilinear 
boundary with a step function 


y=(1te)v2tinInt 


0 k-2 _k-l k 
7 t 


To this end it suffices to establish that, with probability 1, there occur only finitely 
many events 


Be= | sup wtu)> d+ e)xx—1}. 


ak-! <u<a* 
Consider the events 


dp | sup w(u) > (1+ e)xx-i} > B 


u<ak 
(see Fig. 19.1). Because x,/Va* — 00 as k > 00, by Theorem 19.2.2 one has 
P(Ax) = 2P(w(a*) > (1 +¢)xx-1) 
2 WVak | 2(1+6)a*—! InIna*“! | 
ex 


mw (l+e)xp-1 2ak 
_ kk 4 1 
~ (1+e)V walninak-! (Inak-!)(+e)*/a 


=c(a, €) : a 
J/Un(k — 1) + InIna)(k — 1)C+8)"/a 


Put a:= 1+ ¢> 1. Then clearly 


c(€) 


Page. 
(A0~ The fk 


as k > oo. 

In the above formulas, c(a, €) and c(€) are some constants depending on the in- 
dicated parameters. The obtained relation implies that pyar P(A;) < oo and hence 
pal , P( Bx) < © (for By C Ax), so that by the Borel—Cantelli criterion (Theo- 
rem 11.1.1) with probability 1 the events By occur only finitely often. 

We now prove that, for an arbitrary ¢ > 0, 


w(t) ) 
P| lim sup ——— > 1-¢]=1. 19.3.3 
( sare J 2t InInt ( ) 


It is evident that, together with (19.3.2), this will mean that the first assertion of the 
theorem is true. 


548 19 Processes with Independent Increments 


Consider for a > 1 independent increments w(a*) — w(a*—!) and denote by By 
the event 
B= {w(a*) - w(a‘!) > (1 —e/2)rxg}. 


Since w(a*) — w(a‘—!) is distributed as w(a* (1 — a~!)), by virtue of (19.3.1) we 
find, as before, that 


P(Bi) ~ 


Jak(l —a-}) | (1 — €/2)?2a* ne 
ex 
J2n (1 — €/2)x¢ : 2a*(1 — a7) 
eu, €) (0/22 /-a-") 


VInk 


This implies that, for a > 2/e, the series bear P(B;) diverges, and hence by the 
Borel—Cantelli criterion the events B; occur infinitely often, with probability 1. 
Further, by the symmetry of the process w(t), it follows from relation (19.3.2) 
that, for all k large enough and any 6 > 0, 
w(a‘) > —(1+ d)xg. 
Together with the preceding argument this shows that the event 
w(a‘') + [w(a‘) _ w(a‘')] = w(a‘) > -—(14+ d)xp_-1 + 1 — €/2) xg 


will occur infinitely often. But the right hand-side of the above inequality can be 
made greater than (1 — ¢)x, by choosing an appropriate a. Indeed, 


E 
—(1+8)xp-1 + ak > 0 


43) InInak-! sg 

ciaeataee eer me 

alninak 2 
which, in turn, can easily be achieved by taking a large enough. Thus relation 
(19.3.3) is proved. 


The second assertion of the theorem clearly follows from the first by virtue of the 
symmetry of the distribution of w(f). 


once 


Now we can obtain as a consequence the local law of the iterated logarithm for 
the case where t > 0. 


Proof of Theorem 19.3.1 Consider the process {W(u) := uw(1/u), u > 0}, where 


we put W(0) := 0. The remarkable fact is that the process {W(u), u => 0} is also the 
standard Wiener process. Indeed, for t > u, 


Eexp{ia(W() — WW) = Eexp{ ial rn( 7) 7 moO] 
-ven(ofo(?)-0 -a(o(2)--(2))) 


19.4 The Poisson Process 549 


_ aa Ou aa eee | 
me (ae) wong 2 hu 


2 
= exp) ——(t— i 
P{ 5 6 o} 
The independence of increments is easiest to prove by establishing their noncor- 
relatedness. Indeed, 


rnonr-wonl-afor)((2) (2) 
-aeo(2po(2) (ease 


To complete the proof of the theorem, it remains to observe that 


i w(t) i wwl/u) _ W(u) 

1m sup ——— = him sup —— = um sup ; 
tooo V2ftInInt u—>0 ou InIn+ u>0 ou InIn+ 
V u V Uu 


The theorem is proved. 


We could also prove the theorem by repeating the argument from the proof of 
Theorem 19.3.2 with a < 1. 

In conclusion we note that Wiener processes play an important role in many 
theoretical probabilistic considerations and serve as models for describing various 
real-life processes. For example, they provide a good model for the movement of 
a diffusing particle. In this connection, the Wiener processes are also often called 
Brownian motion processes. 

Wiener processes prove to be, in a certain sense, the limiting processes for ran- 
dom polygons constructed on the vertices (k/n, Sx/./n), where S; are sums of ran- 
dom variables €; with Eé; = 0 and Var(&;) = 1. We will discuss this in more detail 
in Chap. 20. The concept of the stochastic integral and many other constructions 
and results are also closely related to the Wiener process. 


19.4 The Poisson Process 


Definition 19.4.1 A homogeneous process &(t) with independent increments is said 
to be the Poisson process if &(t) — (0) has the Poisson distribution. 


For simplicity’s sake put €(0) = 0. If (1) € /7,,, then 


p(A) = Bel = exp{j(e™ —1)} 
and, as we know, 


gr (a) = Ee = g(a) = exp{ut(e — 1)}, 


550 19 Processes with Independent Increments 


so that §(t) € I7,;. We consider the properties of the Poisson process. First of all, 
for each t, &(t) takes only integer values 0, 1,2, .... Divide the interval [0, t) into 
segments [0, t)), [t1, f2),---,[t—1,tn) of lengths A; = 4; — t;-1, i= 1,...,n. For 
small A; the distributions of the increments &(t¢;) — &(¢;-1) will have the property 
that 


P(&(t) — §(:-1) = 0) = P(&(A;) =0) =e"™"4/ = 1 — A; + 0(4?), 


P(E(t) —&(tj-1|)= 1) = pAje ¥Ai = pA; + 0(4}), (19.4.1) 
P(E(t) — &(t-1) = 2) = O(4?). 
Consider “embedded” rational partitions R(n) = {t,..., t,} of the interval [0, rf] 


such that R(n) C R(n + 1) and (J R(n) = R, is the set of all rationals in [0, ¢]. 

Note the following three properties. 

(1) Let v(m) be the number of intervals in the partition R(m) on which the incre- 
ments of the process € are non-zero. For each w, v(m) is non-decreasing as n — oo. 
Furthermore, the number v() can be represented as a sum of independent random 
variables which are equal to | if there is an increment on the i-th interval and 0 
otherwise. Therefore, by (19.4.1) 


P(vin) HE) =P( LJ f€s) —4-1) = AFUE — EG) = i}) 


1,€R(n) 


=0(34}+0-m), 


j=l 
where i A‘ <tmax A; > 0asn— ov, so that a.s. 
v(n) t &(t) © Myr 


as the partitions refine. 

(2) Because the maximum length of the intervals A; tends to 0 as n — oo, the 
total length of the intervals containing jumps converges to 0. 

Therefore, taking the unions of the remaining adjacent intervals A; (i.e. where 
there are no increments of €), for each @ we obtain in the limit, as n — oo, €(t) + 1 
intervals (0, 7;), (71, T2),..., (Ty, t) on which the increments of € are null. 

(3) Finally, by (19.4.1) the probability that at least one of the increments on the 
intervals A; exceeds one is oF 0(A;) = o0(1) as n > ow, so that, with probabil- 
ity 1, the jumps at the points 7; are equal to 1. 

Thus we have shown that, on the segment [0, t], for each w there exists a finite 
number &(t) of points 7, ..., Tzi¢) such that §(u) takes at the rational points of the 
intervals (7;, T,41) one and the same constant value equal to k. This means that one 
can extend the trajectories of the process §(u), say, by continuity from the right so 
that €(u) =k for all u € [Th, Th41). 

Thus, for the original process &(t) we have constructed a modification E(t) with 
trajectories in D4 (T). The equivalence of & and & follows from the very construc- 
tion since, by virtue of (1), 


P(E) = &() =P( lim vin) =€(1) = 1, 


19.4 The Poisson Process 551 


One usually considers just such right (or left) continuous modifications of the 
Poisson process. We have already dealt with processes of this kind in Chap. 10 
where more general objects—renewal processes—were defined from scratch using 
trajectories. That the Poisson process is a renewal process is seen from the following 
considerations. It is easy to establish from relations (19.4.1) that the distributions of 
the random variables 7|, JT, — T;, T3 — Tz, ... coincide and that these variables are 
independent. Indeed, the difference T; — Tj-1, j = 1, To = 0, can be approximated 
by the sum (y; — yj-1)A of the jeneihs of identical intervals of size A; = A, where 
y; 1s the number of the interval in which the j-th non-zero increment of € occurred. 
Since the process &(t) is homogeneous with independent increments, we have 


P((y;j = vj-DA > u) = P(n > *) = (e7#4) [u/A] > eT eH, 


P((y; —yj-)A> u) —> P(T; — Tj-1 >) 


as A — 0. Hence the variables t; := T; — Tj-1, j = 1,2,3,..., have the exponen- 
tial distribution, and the value €(t) + 1 can be considered as the first crossing time 
of the level t by the sums 7;: 


&(t) = max{k : Ty < t}, &(t) + 1 =min{k: J, >t}. 


Thus we obtain that the Poisson process & (t) coincides with the renewal process 7 (t) 
(see Chap. 10) for exponentially distributed variables 1), t2,... with P(tj>u) = 
eke. 

The above and the properties of the Poisson process also imply the following 
remarkable property of exponentially distributed random variables. The numbers of 
jump points (i.e. sums 7;,) which fall into disjoint time intervals 6; are independent, 
these numbers being distributed according to the Poisson laws with parameters jd ;. 

Using the last fact, one can construct a more general model of a pure jump ho- 
mogeneous process with independent increments. Consider an arbitrary sequence 
of independent identically distributed random variables ¢1, ¢2,... that have a ch.f. 
B(A) and are independent of the o-algebra generated by the process &(t). Construct 
now a new process ¢(t) as follows. To each w we put into correspondence a new 
trajectory obtained from the trajectory &(t) by replacing the first unit jump with the 
variable ¢,, the second one with the variable ¢, and so on. It is easy to see that ¢(f) 
will also be a process with independent increments. The value ¢(f) will be equal to 
the sum 


CH Ho +--+ lea (19.4.2) 


of the random number &(t) of random variables ¢, f2,..., where &(t) is indepen- 
dent of {¢,} by construction. 
Hence, by the total probability formula, 


oo 
FeSO = SPEC) = k) Ee! Git ~+5%) 
k=0 


“> a eH (B(a)) = eH HHO) — guBO-D_ (49.4.3) 


552 19 Processes with Independent Increments 


Definition 19.4.2 The process ¢(t) defined by formula (6) or ch.f. (7) is called a 
compound Poisson process. It is evidently a special case of the generalised renewal 
process (see Sect. 10.6). 


As we have already noted, it is again a homogeneous process with independent 
increments. In formula (19.4.3), the parameter jz determines the jumps’ intensity 
in the process ¢(t), while the ch.f. 6(A) specifies their distribution. If we add a 
constant “drift” gt to the process f(t), then C(t) = ¢(t) + qt will clearly also be 
a homogeneous process with independent increments having the ch.f. Ee’*5“ = 
ef (AgtH(BA)—-1)) _ 

Finally, if a Wiener process w(t) with zero drift and diffusion coefficient o is 
given on the same probability space and is independent of ¢(¢), and to each w we 
put into correspondence a trajectory of ¢(t) + w(t), we again obtain a process with 
independent increments, with ch.f. exp{t(iAg + w(B(A) — 1) -— acura) 8 

One should note, however, that these constructions by no means exhaust the 
whole class of processes with independent increments (and therefore the class of 
infinitely divisible distributions). 

A description of the entire class will be given in the next section. 

The Poisson processes, as well as Wiener processes, are often used as mathemat- 
ical models in various applications. For example, the process of counts of cosmic 
particles of certain energy registered by a sensor in a given volume, or of collisions 
of elementary particles in an accelerator are described by the Poisson process. The 
same is true of the process of incoming telephone calls at a switchboard and many 
other processes. 

Due to representation (19.4.2), the study of compound Poisson processes re- 
duces, in many aspects, to the study of the properties of sums of independent random 
variables. 


19.5 Description of the Class of Processes with Independent 
Increments 


We saw in Theorem 19.1.1 that, to describe the class of distributions of stochasti- 
cally continuous processes with independent increments, it suffices to describe the 
class of all infinitely divisible distributions. Let, as before, £ be the class of the 
ch.f.s of infinitely divisible distributions. 


Lemma 19.5.1 The class £ is closed with respect to the operations of multiplication 
and passage to the limit (when the limit is again a ch f.). 
1/n 1/nyn 1/n 1/n. 

Proof (1) Let | € £ and g2 € £. Then gi 92 = (y,"" + gy)", where gy," +g," is 
a ch.f. 

(2) Let g, € £, Gn > gy, and @g be a ch.f. Then, for any m, gi/™ + gm 
as n — oo, where g!/ ™ is continuous at zero and hence is a ch.f. The lemma is 
proved. 


19.5 Description of the Class of Processes with Independent Increments 553 


Denote by £7 C & the class of ch.f.s whose logarithms have the form 


Ing) =iAg + yee _ 1), ce = 0, yo < OO. 
k k 
We will call this the Poisson class. We already know that it corresponds to com- 
pound Poisson processes with drift g and intensities cx, of jumps of size by (note 
that 7, ce (elk — 1) = oO, cy)E(e'*5 — 1), where ¢ assumes the values b; with 
probabilities cx /> 7; ¢;). 


Lemma 19.5.2 A ch,f. g belongs to £ if and only if 9 = limps Yn, Pn € £7. 


Proof Sufficiency. Let 


Ing, = Yo (iAgkn + Chin (ee _ 1)), 
k 


and g = limg, be a ch.f. It is evident that g,/™ € £7 C&£ and g/m > gi/m, 


Therefore y!/”, being a limit of a sequence of ch.f.s which is continuous at zero, is 
a ch-f. itself, so that gp € £. 

Necessity. Let pg € £. Then y(A) £0 and there exists 6 := Ing with n(g!/" — 
1) > £, and 


g/t 1 = fe — 1) dFy(x). 


The integral of the continuous function on the right-hand side can be viewed as a 


Riemann-Stieltjes integral. This means that for F,, there exists a partition of the real 


axis into intervals A,,x such that, for xn% € Ang and rn < cn 2, 


fem = 1) dF, (x) = > fe = 1) Pn(Ank) +Tn 
k 
(P,, (A) is the probability of hitting the interval A corresponding to F,,). We obtain 


B=limn(g'/” — 1) = lim E Si (el*rnk — 1) Pun} 


noo 
k 


The lemma is proved. 


Theorem 19.5.1 (Lévy—Khintchin) A ch.f. yg belongs to & if and only if the function 
B :=1ng@ admits a representation of the form 


id 1+x? 
ee ) Te dws), (19.5.1) 


1+ x2 x2 


p=p0sa.wy=ing+ f (e™ 1 


where W is anon-decreasing function of bounded variation (i.e., a distribution func- 
tion up to a constant factor), the integrand being assumed equal to —i7/2 at the 
point x = 0 (by continuity). 


554 19 Processes with Independent Increments 


Proof Assume that 6 has the form (19.5.1). Then 6(A) is a continuous function, 
since it is (up to a continuous additive term Aa) a uniformly convergent integral of 
a continuous bounded function. Further, let x, #4 0,k =1,...,7, be points of refin- 
ing partitions of intervals [—./n, ,/n ). Then B°(A) = B(A) —idgq can be represented 
as 6° = lim B, with 

n 


Bula) = ¥[idgen + Cen (e*™ — 1)] € Lr, 
k=1 


where, under a natural notational convention, one should put 


x 1 
k 
Ckn = 5 a W ([xkn, Xk+1,n)); dkn = = W ([xkns *RELn))s bin = Xkn, 
k 


kn if 


W being used to denote the measure Y(A) = ‘i 4a (x). We obtain that g is a limit 
of the sequence of ch-f.s g, € £77. It remains to make use of Lemma 19.5.2. 
Now let g € £. Then 


B=limn(g'/” — 1) = tim f (e* — 1)ndF,(x) 


: + nx 
tim. f ia dF, (x) 


x2 


: irAx 1+x2 nx? 
+f (es 1 =) pau} (19.5.2) 


If we put 


nx nx? 
qn := i W(x) = ie (19.5.3) 
then on the right-hand side of (19.5.2) we will have lim B,,, By = B(A; dn, Yn). 

Now assume for a moment that the following continuity theorem holds for func- 
tions from £. 


Lemma 19.5.3 If B, = B(A; dn, %,) — B and B is continuous at the point } = 0, 
then B(A) has the form B(A; q,¥), dn > g and, > WV. 


The symbol = in the lemma means convergence at the points of continuity of 
the limiting function (as in the case of distribution functions) and that W, (+00) > 
W (+00). 

If the lemma is true, the required assertion of the theorem will follow in an obvi- 
ous way from (19.5.2) and (19.5.3). It remains to prove the lemma. 


Proof of Lemma 19.5.3 Observe first that the correspondence B(A; gq, W) = (q, W) 
is one-to-one. Since in one direction it is obvious, we only have to verify that B 
uniquely determines q and W. To each f we put into correspondence the function 


19.5 Description of the Class of Processes with Independent Increments 555 


: 1 
yA) -| a - 5 (BA +h) — BA m)| dh 


-[ [(@- (eh@r ins 2 gory) 1 +x? dW (x) dh, 


where 
JF (eiatinx _ ei bx) ie ashy 
2 
1 " 
i e*(1 — coshx) dh =e! a(t = =), 
0 Xx 
sinx \ 1+ x2 
O0<c, < | 1-—— 7 <€2 < 00. 
x x 
Therefore 
ya) = / eral (x), 
where 


re= (.-S)" dW (u) 
66 u u2 


is (up to a constant multiplier) a distribution function, for which y (A) plays the role 
of its ch.f. Clearly, 


x [4 y2 sinu\~! 
W(x) = ‘i 1 — —— dI(u), 
aoe. u 


so that we obtained a chain of univalent correspondences B > y > I’ > W which 
proves the assertion. 

We return to the proof of Lemma 19.5.3. Because ePn > e | ePn is ach.f., and 
e? is continuous at the point 4 = 0, we see that e? is a ch.f. and hence a continuous 
function. This means that the convergence g, — g is uniform on any interval, 


1 1 
nO) = [ Ei 0) — 5 (Bu +2) + Br m»)| dh 


! 1 
= | |B) — 7(60+m) + 6G.—m) Jat =:y(A), 


and the function y (w) is continuous. By the continuity theorem for ch.f.s, this means 
that y(u) is a ch.f. (of a finite measure I”), Ij, = I” (where Jj, is the preimage of 
Yn), Y, => WY, and g, — q. Thus we establish that 


: pon ee hx irAx 
B=lim By, = im ra +f ( -1- *) diby(0) 


=iAg+ (e™-1- dy dW (x) = B(A:g,W). 
=1hq [te = Lae a) x)= BA; q, 


Lemma 19.5.3 is proved. 


556 19 Processes with Independent Increments 


Theorem 19.5.1 is proved. 


Now we will make several remarks in regard to the structure of the process &(f) 
and its relationship to representation (19.5.1). The function W in (19.5.1) corre- 
sponds to the so-called spectral measure of the process &(t) (recall that we agreed 
to use the same symbol WY for the measure itself: Y(A) = ia dW (x)). It can be 
represented in the form w(x), where uw = W(oo) — W(—oo) and WY (x) is a dis- 
tribution function. 

(1) The spectral measure of the Wiener process is concentrated at the point 0. If 
W ({0}) =o, then &(1) € ©, ,2- 

(2) The spectral measure WY of a compound Poisson process has the property 


firtaven < 00. 


In that case 


XK 2 
Gay= | ae dW (u) 


—cC 
possesses the properties of a distribution function, and w(A; g, ¥) may be written 
in the form 


iAqy + fe — 1)dG(x), 
where 
n=4 = f lav, 


(3) Consider now the general case, but under the condition that W({0}) = 0. As 
we know, the function w can be approximated for small A by expressions of the 
form (we put A; = [(k — 1)A,kA)) 

(oe) ; 2 
. ix . 1+ (kA) 
Aq+ ——W(A;) + (e*4 — 1) —_ 
nat Yo [Fawn + (ea) EES 


k#0 


van}, 


which corresponds to the sum of Poisson processes with jumps of sizes kA of the 
respective intensities 
1+ (kA)? 
(kA)? 


[ dW (x) 
— OO, 
+9 x? 


then for any ¢ > 0 the total intensity of these processes with jumps from the interval 
(0, €) will increase to infinity as A — 0. This means that, with probability 1, on any 
time interval 5 there will be at least one jump of size smaller than any given ¢ > 0, 


W (Ax). 


If, say, 


19.5 Description of the Class of Processes with Independent Increments 557 


so that the trajectories of &(t) will be everywhere discontinuous. To “compensate” 
these jumps, a drift of size W(A;,)/kA is added, the “total value” of such drifts being 
possibly unbounded (if f(7).x~'dW (x) = 00). 

(4) For stable processes (see Sect. 8.8) the functions W (x) have power “branches”, 
smooth on the half-axes, possessing the property c) W(x) = W’(c2x) for appropriate 
cy and cp. 


Chapter 20 
Functional Limit Theorems 


Abstract The chapter begins with Sect. 20.1 presenting the classical Functional 
Central Limit Theorem in the triangular array scheme. It establishes not only con- 
vergence of the distributions of the scaled trajectories of random walks to that of 
the Wiener process, but also convergence rates for Lipshchitz sets and distribution 
functions of Lipshchitz functionals in the case of finite third moments when the 
Lyapunov condition is met. Section 20.2 uses the Law of the Iterated Logarithm for 
the Wiener process to establish such a low for the trajectory of a random walk with 
independent non-identically distributed jumps. Section 20.3 is devoted to proving 
convergence to the Poisson process of the processes of cumulative sums of indepen- 
dent random indicators with low success probabilities and also that of the so-called 
thinning renewal processes. 


20.1 Convergence to the Wiener Process 


We have already pointed out in Sect. 19.2 that the Wiener processes are, in a certain 
sense, limiting to random polygons with vertices at the points (k/n, S;/./n), where 
Sy = & +--+ + & are partial sums of independent identically distributed random 


variables &), &,... with zero means and finite variances. Now we will give a more 
precise and general meaning to this statement. 
Let 
Ein +205 Ean (20.1.1) 


be independent random variables in the triangular array scheme (see Sects. 8.3, 8.4), 
k 
Cint= > 8jn, Eftn=0, Eff, =Ofn: 
j=1 


that have finite third moments E|é;.»|> = Lkn < OO. 
We will assume without loss of generality (see Sect. 8.4) that 


n 
VatGnnd => @7 = 


j=l 


A.A. Borovkov, Probability Theory, Universitext, 559 
DOI 10.1007/978-1-4471-5201-9_20, © Springer-Verlag London 2013 


560 20 Functional Limit Theorems 


Fig. 20.1 The random 
polygon s,,(t) constructed 
from the random walk 


C0, 61, 62,.-- 


Put 


k 
— 2 
ten = Dom 
j=l 


so that fo, = 0, t., = 1, and consider a random polygon with vertices at the points 
(tk, $k), where we suppress the second subscript n for brevity’s sake: t, = ty.n, 
ok = Ckn- 

We obtain a random process on [0, 1] with continuous trajectories, which will 
be denoted by s, = s,(t) (see Fig. 20.1). The functional limit theorem (or invari- 
ance principle; the motivation behind this second name will be commented on be- 
low) states that for any functional f given on the space C(O, 1) and continuous in 
the uniform metric, the distribution of f(s,) converges weakly to that of f(w) as 
n— OOo: 


f Gn) => fl), (20.1.2) 


where w = w(t) is the standard Wiener process. The conventional central limit the- 
orem is a special case of this statement (one should take f(x) to be x(1)). 

The above assertion is equivalent to each of the following two statements: 

1. For any bounded continuous functional f, 


Ef(%,) > EfWw), no. (20.1.3) 


2. For any set G from the o-algebra Bcio,1) of Borel sets in the space 
C(0, 1) (8co,1) is generated by open balls in the metric space C(O, 1) endowed 
with the uniform distance p; as we already noted, Bco,1) = si) such that 
P(w € 0G) = 0, where 0G is the boundary of the set G, one has 


P(s, €G) > P(weG), noo. (20.1.4) 


Relations (20.1.3) and (20.1.4) are equivalent definitions of weak convergence of 
the distributions P,, of the processes s,, to the distribution W of the Wiener process 
w in the space (C (0, 1), 8c,1)). More details can be found in Appendix 3 and in 
[1] and [14]. 

The main results of the present section are the following theorems. 

As before, put L3 := )-4_1 Mkn- 


Theorem 20.1.1 Let L3 — 0 as n > oo (the Lyapunov condition). Then the con- 
vergence relations (20.1.2)-(20.1.4) hold true. 


20.1 Convergence to the Wiener Process 561 


Remark 20.1.1 The condition L3 — 0 can be relaxed here to the Lindeberg condi- 
tion. In this version the above convergence theorem is known under the name of the 
Donsker—Prokhorov invariance principle. 


Along with Theorem 20.1.1 we will obtain a more precise assertion. 


Definition 20.1.1 A set G is said to be Lipschitz if W(G) — W(G) < ce for some 
c < 00, where G) is the ¢-neighbourhood of G and W is the measure correspond- 
ing to the Wiener process. 


In the sequel we will denote by the letter c (with or without subscripts) absolute 
constants, possibly having different values. 


Theorem 20.1.2 If G is a Lipschitz set, then 
|P(s, € G) —P(w €G)| <cLy"*. (20.1.5) 


In the case when &% » = &&/,/n, where the &, do not depend on n and are iden- 
tically distributed with E&, = 0 and Var(&) = 1, the right-hand side of (20.1.5) 
becomes cn~!/8 

A similar bound can be obtained for functionals. A functional on C(O, 1) is said 
to be Lipschitz if the following two conditions are met: 


(1) |f@) -— FO) < ce@, y); 
(2) the distribution of f(w) has a bounded density. 


Corollary 20.1.1 Jf f is a Lipschitz functional, then Gy := { f(x) < v} is a Lips- 
chitz set (with one and the same constant for all v), so that by Theorem 20.1.2 


sup|P(f(w) < v) —P(f (sn) <v)| <cL,"". 


The above theorems are consequences of Theorem 20.1.3 to be stated below. 
Let 


Mlns+++s4n.n (20.1.6) 


be any other sequence of independent identically distributed random variables in the 
triangular array scheme with the same two first moments En; , = 0, En n = a, i 
and finite third moments. Denote by Fx, and ®,;,, the distribution functions of &% yn 
and nx,n, respectively, and put 


n 
F 3 . 
Vkn = Elnx,n| <0, N3 => tn 
k=1 


i 3S / |x|? |d( Fin (x) — Pe,n(x))| < Hin + Ven, 
n 


LS =>) Un < Lat No. 
k=1 


562 20 Functional Limit Theorems 


Denote by s/ (t) the random process constructed in the same way as s,(f) but using 
the sequence {nxn}. 


Theorem 20.1.3 For any A € 8co,1) and any ¢ > 0, 


2 cL 
P(s, € A) < P(s,, € AC”) + 3. 
é 


In order to prove Theorem 20.1.3, we will first obtain its finite-dimensional 
analogue. Denote by ¢ and n the vectors ¢ = (¢],...,f,) and 7 = (1,..., 72) 
respectively, where &j := ee Ejn and ny = a nin» and by B®) the e- 
neighbourhood of a set B € R”: 


BO v= U (x +0), 


xeB 
|v|<e 


where x = (X1,...,Xn), V= (VJ,..., Un), and |v| = max, |v,|. 


Lemma 20.1.1 Let B be an arbitrary Borel subset of R". Then, for any ¢ > 0, 


0 
cL 


Po € B) <P(ne B™) + —3. 
E 


Proof! Introduce a collection of nested neighbourhoods 


B©(k) = U (X1,--+5 Xk, Xk41 + Ukpis+++5Xn + Un); k=0,...,n, 


xeB 
|v|<e 


B:= Bn) c BO(n—1) C++» C BOW) c BO") = BO 


and denote by e, the vector (0,...,0,1,0,...,0), where | stands in the k-th posi- 
tion. It is obvious that if x € B® (k), then 


x+epy,€ BO(K—1) if |u| <e. (20.1.7) 

Further, together with arrays (20.1.1) and (20.1.6), consider the collection of 
“transitional” arrays 

Etns +++ Ekeons Nkt+Lns + +s Mans k=0,...,n. (20.1.8) 


Denote by ¢(k) = (1 (Kk), ..., &n(k)) the vectors formed by the cumulative sums of 


random variables from the k-th row (20.1.8), so that 
cj for j <k, 

ga={ 2 
Ck Nktin te tin for j>k. 


To continue the proof of Lemma 20.1.1 we need the following. 


'The extension of the approach to proving the central limit theorem used in Sect. 8.5, which is 
used in this demonstration, was suggested by A.V. Sakhanenko. 


20.1 Convergence to the Wiener Process 563 


Lemma 20.1.2 For any random variable 6 such that P(|5| < €) = 1, one has 


P(E EB) <P(ne B™) +) Ay, (20.1.9) 
k=1 


where 


Ax = P(e (k) + Se(k — 1) € B® (k — 1)) — P(g(k — 1) + Se(k — 1) € B©(k 1), 


a= 67S OQ lynn Ds 


j=r+l 
Proof Indeed, by virtue of (20.1.7), 


P(E € B) =P(E(n) € B® (n)) < P(e(n) + e(n — 1)5 € B® (n — 1) 
=P(¢(n— 1) +e — 1)8 € B (n—1)) + An. 


Reapplying the same calculations to the right-hand side, we obtain that 


P(f(n- 1) +e(n— DS EB (n—1)) 
<P(g(n — 1) +e(n — 198 + en_-16 € B® (n — 2)) 
=P(é(n — 1) +e(n — 2)5 € B® (n — 2)) 
=P(¢(n — 2) +e(n — 2)5 € B® (n —2)) + An-1 
P(¢(1) + e(1)5 € B® (1) < P(Z(1) + e(1)5 + €15 € B® (0) 
= P(¢(1) + e(0)5 € B® (0)) = P(z (0) + e(0)5 € B® (O)) + A}. 


Since ¢(0) = n and P(y + e(0)5 € B®) < P(n € B®), inequality (20.1.9) is 
proved. Lemma 20.1.2 is proved. 


To obtain Lemma 20.1.1, we now have to estimate A;. It will be convenient to 
consider, along with (20.1.8), the sequences 


Eins vie 9 Ck —Tns Ys k+1,n>-++5 n,n 


and denote by ¢(k, y) = (G1 (Kk, y),..., &n(k, y)) the respective vectors of cumulative 
sums, so that 


O(k, &k,n) = $(k) = 6K, 0) + Ee n€ae—p, 
(kK, nen) = O(k — 1) = Fk, 0) + yk nen. 
Then A, can be written in the form 
Ax = P((k, 0) + (5 + Ende(k — 1) € BO (k— 1) 
—P(e(k, 0) + (8 + menletk— 1 € B©(kK—-1)). — (20.1.10) 


564 20 Functional Limit Theorems 


Take 6 to be arandom variable independent of ¢ and 7. Then it will be convenient 
to use conditional expectation to estimate the probabilities participating in (20.1.10), 
because, for instance, in the equality 


P(e (kK, 0) + (6 + Ene(k — 1) € B© (K-11) 
= EP((5 + éene(k — 1) € B® (k — 1) — £(k,0) | E(k, 0)) (201.11) 


the set C = B® (k — 1) — ¢(k, 0) may be assumed fixed (see the properties of con- 
ditional expectations; here 6 and &;_, are independent of ¢(k,0)). Denote by D the 
set of all ys for which y e(k — 1) € C. We have to bound the difference 


P(S + En € D) —P(S + nkn € D). (20.1.12) 


We make use of Lemma 8.5.1. To transform (20.1.12) to a form convenient for 
applying the lemma, take 6 to be a random variable having a thrice continuously 
differentiable density g(t) and put for brevity &, = & and nx, =n. Then d+& 
will have a density equal to 


dFe(t)g(y — 1) =Eg(y — 8), 


so that 
PO+EeD)=| Bey-Ody=Ef e-Bay. 


Now putting 


h(x) = g(y — x) dy, 
D 
we have 
P(6+é€ D)=ENh@), 
where h is a thrice continuously differentiable function, 
|n’” (x)| < fie” | dy =: hs. 


Applying now Lemma 8.5.1 we obtain that 
h3 9 
|P(6+é € D)—P(6+n€D)|=|E(h(E)—h())| < HR, 


12a = | [ld (Fin) ~ Pin62))] 
Because the right-hand side here does not depend on &(k, 0) and D in any way, we 
get, returning to (20.1.10) and (20.1.11), the estimate 


h 
6 
Now let g1(x) be a smooth density concentrated on [—1, 1]. Then, putting 


x\1 
g(x) = a(2)-. 
ele 


[Akl < =H) ,- (20.1.13) 


20.1 Convergence to the Wiener Process 565 
we obtain that 


1 
m= | 5 a? Jar = [ielar=$ 
e4 E E€ 


The assertion of Lemma 20.1.1 now follows from (20.1.9), (20.1.13) and 
(20.1.14). 


c, = const. (20.1.14) 


Proof of Theorem 20.1.3 This theorem is a consequence of Lemma 20.1.1. Indeed, 
let B € R” be such that the events {s, € A} and {¢ € B} are equivalent (s, is com- 
pletely determined by ¢). Then clearly {s, € AMV = {CE Be )} and the assertion of 
Theorem 20.1.3 repeats that of Lemma 20.1.1. Theorem 20.1.3 is proved. 


Proof of Theorem 20.1.1 Let w(t) be the standard Wiener process. Put nx,n := 
W (thn) — W(tk-1,n). Then the sequence 7) .7,..., n,n Satisfies all the required con- 
ditions, for 


2 2 3 3 
Enk.n = 0, En; , =O n: Vin = Elen = C304, < OO. 


Note also that 


2 
oR n— = (Elé&n | >) ‘i = Eléenl = [k,n 
so that 


n n 
N3=) ven =03 > of, $033 > 0. 
k=1 k=1 


We will need the following 
Lemma 20.1.3 P(p(w, s/,) > €) < cN3/e?. 


Proof The event {p(w, s/,) > e} is equal to J, Ax, where 
€ 
Ag i= | sup| w(t) — s’(t)| > e| e sup|w(t)| > | Ty = [te—1, th] 
tel, tel 2 


. d 
Therefore, recalling that t, — t,-1 = On and w(t) = ow(t/o7), we have 


Pca) <P( sup lwo] > x) <2(1-@ ( )). 
te[0,1) 20k,n 20k,n 


The function (1 — ®(t)) vanishes as t > oo much faster than t~>. Hence 


3 
é Cn aes 
21 ooh, | | Az 
( Gale & r(L se 


Lemma 20.1.3 is proved. 


We see from the proof that the bound stated by Lemma 20.1.3 is rather crude. 


566 20 Functional Limit Theorems 


We return to the proof of Theorem 20.1.1. Because 
P(s), € G) =P(s, € G, p(w, 5) <£) + P(s), € G, o(w,s,) >), 
we have 


P(s, €G) <P(weG) + ae (20.1.15) 


and, by Theorem 20.1.3, 


LS+N 
P(s, € G) <P(w e GO) + tee) 


é 


Now we prove the converse inequality. Introduce the set G°~®) := G — (8G). 
Then [G~*]®) =: G° c G. Swapping s, and s/, in Theorem 20.1.3 and applying 
the latter to the set G2), we obtain 


c(L9 + N3) 
=. 


P(sn € G°) > P(s), eG) — : 


(20.1.16) 


Swapping w and s’, in (20.1.15) and applying that relation to G‘—**), we find that 


-2 3 cN3 
P(s, ¢ G~) > P(we GO*?) — a 
This and (20.1.16) imply that 
c(L§ + N3) 


P(s, € G) > P(s, € G°) => P(we G~*) — 3 
Setting 
P(w eG) — Pw e G) = W(G™) — WG) =: We (e) 


and taking into account that N3 < cL3 and Ly < L3+ N3, we will obtain that 


cL 
Wel-38) +—, 
€ 
If W(dG) = 0 then clearly 
w(G??) — w(G*) > 0 
as € + 0, and WG(+3e) — 0. From this and (20.1.17) it is easy to derive that 
P(s, €G) > PweG), now. 


3 cL; 
< P(s, € G) — W(G) < WG(e) + >: (20.1.17) 
E 


Convergence f(s,) = > f(w) for continuous functionals follows from (20.1.4), 
since if v is a point of continuity of the distribution of f(w) then the set G, = {x € 
C(0, 1): f(x) < v} has the property 


W(dG,) = P(f(w) =v) =0 
and therefore 

P(f (sn) <v) > P(f(w) € v). 
Theorem 20.1.1 is proved. 


20.1 Convergence to the Wiener Process 567 


Proof of Theorem 20.1.2 If G is a Lipschitz set, then 
| AWg(+3e)| <ce, 
and by (20.1.17) 


|P(sn € G) — W(G)| < c(« + =). 
E 


Putting ¢:= ce * we obtain the required assertion. Theorem 20.1.2 is proved. 


The reason for the name “invariance principle” used to refer to the main asser- 
tions of this section is best illustrated by Theorem 20.1.3. By virtue of the theorem, 
one can approximate the value of P(s, € A) by P(s/, € A) for any other sequence 
(20.1.6) having the same first two moments as (20.1.1). In that sense, the asymp- 
totics of P(s, € A) are invariant with respect to particular distributions of the un- 
derlying sequences with fixed first two moments. For example, the calculation of 
P(s, € G) or P(w(t) € G) can be replaced with that of P(s’, € G) for a Bernoulli 
sequence, which is convenient for various numerical methods. On the other hand, the 
probabilities P(w € G) for a whole class of regions G were found in explicit form 
(see e.g. [32]). We know, for example, that P(sup,¢j9,1] W() > y) =2(1 — @(y)). 
(This implies, in particular, that G = {x € C(O, 1) SUP; ero, 1] ¥ (1) > y} is a Lips- 
chitz set.) Hence for the distribution of the maximum S,, = maxy<y S; of the sums 
Sk = Ye &;, when E& = 0 and Var & =o”, we have 


P(S, > xo Jn) > 2(1- B(x)), n>, 


and one can use this relation for the approximate calculation of the distribution of 
Sn which is, as we saw in Chap. 12, of substantial interest in applications. 

In the same way we can approximate the joint distribution of S,, S,, and S r= 
ming<n Sz (i.e. the probabilities of the form P(S, < x.J/n, S, > y/n, Sn € B)) 
using the respective formulas for the Wiener process given in Skorokhod (1991). 


Remark 20.1.2 In conclusion of this section note that all the above assertions will 
remain true if, instead of s,(t), we consider in them the step function s(t) = fin 
for t € [t, tg41). One can verify this by repeating anew all the arguments for s*. 
Another way to obtain, say, Theorems 20.1.1 and 20.1.2 for s* is to make use of the 
already obtained results and bound the distance p(s, 5,7). Because 


n 


{0(Sn. Sn) > e} C LU {lén! > e}, 


k=1 
one has 


n n L 3 


P(o(sn,52) > €) < )*P(I&enl > 6) < 0 = 3 


3 a 
E E 
k=1 k=1 


Recall that a similar bound was obtained for p(s/,, w), and this allowed us to 
replace, where it was needed, the process s/ with w. Therefore, using the same 


568 20 Functional Limit Theorems 
argument, one can replace s, with s*. In that case, we can consider convergence 
of the distributions of functionals f(s*) defined on D(O, 1) (and continuous in the 


uniform metric ¢). Sometimes the use of s* is more convenient than that of s,. This 
is the case, for example, when one has to find the limiting distribution of 


~~ (kn) =n / g(s2(t)) dt 
k=1 


(&&.n are identically distributed). It follows from the above representation that 


1 n 
- > sex) = | ewe) dt, n>o. 
k=1 


20.2. The Law of the Iterated Logarithm 


Let &,, &2,... be a sequence of independent random variables, 


E&=0, Eé=o7, El& =e, 
n n n 
Sn => Sts B= o;, Mn = >_ Uk. 
k=1 k=1 k=1 


In this notation, the Lyapunov ratio is equal to 
Iga dee 
3 = 431 = B3 


In the present section, we will show that the law of the iterated logarithm for the 
Wiener process and Theorem 20.1.2 imply the following. 


Theorem 20.2.1 (The law of the iterated logarithm for sums of random variables) 
Tf By > © asn— oo and L3.n < c/n By for some c < o&, then 


__ Ss. 
P| im ——_——— =1)]=1, 20.2.1 
(im, B,/2\nIn B, ) ( ) 


Sh 
P( lim =-l)=1. 20.2.2 
(im Bn 2\nIn By, ) ( ) 


Thus all the sequences which lie above 


(1+.e)B,/2InIn B, 


will be upper for the sequence of sums S,,, while all the sequences below 


(1 — €)B,JV21nIn B, 


will be lower. 


20.2 The Law of the Iterated Logarithm 569 


The conditions of the gs will ceay be satisfied for identically distributed 
&;, for in that case B? = oF n, L3n= = p1/(o; Jn). 


Proof We turn to the proof of the law of the iterated logarithm in Theorem 19.3.2 
and apply it to the sequence S,,. We will not need to introduce any essential changes. 
One just agg to consider S;,, instead of wat), where nx = min{n : B? > a‘y, and 
replace a* with B2 where it is needed. By the Lyapunov condition, max;<n of = 
0(B?), so that a as Br, ~ ak ask > oo. 

The key point in the proof of Theorem 19.3.2 is the proof of convergence (for 
any € > 0) of the series 


k 


> P( sup w(w) > (1+ e)xe-1) (20.2.3) 


k u<a* 


and divergence of the series 


YP(w(at) —w(at-!) > (1 = 5s): (20.2.4) 
where 
xe =V2akIninak, — w(a*)— w(a*!) 2w(ak(1 - a7). 


In our case, if one follows the same argument, one has to prove the convergence of 
the series 


YS P(Sn, > + €) ye-1) (20.2.5) 
k 


and divergence of the series 
€ 
YP(sn = Srry—1 > (1 _ 5) >): (20.2.6) 
k 


where yz, = (2B. InIn Br. ~ xx. But the asymptotic behaviour of the probabilities 
of the events in (20.2.3), (20.2.5) and (20.2.4), (20.2.6) under the conditions L3 , < 
c/1nB, will essentially be the same. To establish this, we will make use of the 
inequality 


Sn (+5) cL3n 
P(# € ) P(w EG ) < 3 (20.2.7) 


which follows from the proof of Theorem 20.1.3. By this inequality, 


(3 > (+ 30x) < < P(sup ww > (1 +2e)x) + — 
=P( sup w(w) > (1 + 2e)xBy Vy <3 us 


u<B2 


570 20 Functional Limit Theorems 


Therefore (see (20.2.5)), putting m := ng and x := y,z_1/B,, we obtain 


) cL3.n, 


P(Sn, > (1+3e)yk-1) = P( sup w(u) > +28) 1) + Sq Rae 
Nk 


usBry 
Here 


Ye-1~Xk-1, Ba, >a*, Inn By, ~ InIna* ~ Ink, 


re 2 Cc Cc Cl 
3ynp S ie ed : 
*~InBn,, Inat k 


Consequently, for all sufficiently large k (recall that the letter c denotes different 
constants), 


_ c 
P(Sn, > (1+ 3e)yx-1) < P( sup w(u) > (1+ e)xe-1) 7 k(nk3 
Since 
= 1 
Perey 8 20.2.8 
2 nw ~~ — 


the above inequality means that the convergence of series (20.2.3) implies that of se- 
ries (20.2.5). The first part of the theorem is proved. 
The second part is proved in a similar way. Consider series (20.2.6). By (20.2.7), 


P(Sny — Sny_, > (1 — 38) yx) 


= (sn) =r) = 38) ~ ) 
Nk 


L : 
> P(wid) wry) > (1 — 28) = ) : a (nin B2,)~*/*, (20.2.9) 
Nk 


where rz, = Bre F BS — a™' due to the fact that 


By =a" +Oo,, O<%<1, 0, =0(B;,). 


Nk? Nk = 
The first term on the right-hand side of (20.2.9) is equal to 
k 


(wir —-n)>d- 22) 24 ) > P(w(a*(1 —rx)) > (1 — ©) xx) 
Nk 


—gq-! 
=P(w(ol(I-a-)) > do — ) 


> P( w(a!) aes. (: - 5s): 


As before, the series consisting of the second terms on the right-hand side of (20.2.9) 
converges by virtue of (20.2.8). Therefore the established inequalities mean that 
the divergence of series (20.2.4) implies that of series (20.2.6). The theorem is 
proved. 


20.2 The Law of the Iterated Logarithm 571 


Now we will present an example that we need to complete the proof in Re- 
mark 4.4.1. 


Example 20.2.1 Let ¢, be independent and identically distributed, Eg, = 0, 
E¢? = 1, Elg&|? = uw < co and & = J2k &%. Here we have B? = n7(1 + 1/n). 
In Remark 4.4.1 we used the assertion that (in a somewhat different notation) 


r( LJts, < -n) =1 
n=1 


or, which is the same (as the sign of S$, is inessential), 


(Us, > n) -»( Uf > a, (1 + o(2))}) =1, (20.2.10) 


To verify it, we will show that any sequence of the form B) = B,(1+O(1/n)) is 
lower for {S;}. In our case, 


n 
M, = 
M,, = poe ee, cn?! L3n= ul ~cn 1/2 < —_ 
k=1 ne 
This means that the conditions of Theorem 20.2.1 are met, and hence any sequence 


which lies lower than (1 — ¢)n/2 In Inn (in particular, the sequence B/, = n) is lower 
for {S,}. This proves (20.2.10). 


Let us return to Theorem 20.2.1. As we saw in Sect. 19.3, the proof of the law 
of the iterated logarithm is based on the asymptotics (the rate of decrease) of the 
function 1 — @(x) as x > oo. Therefore, the conditions for the law of the iterated 
logarithm for the sums S,, are related to the width of the range of x values for which 
the probabilities 


P,4(x) := p(aZ > x) 
By 


are approximated by the normal law (i.e. by the function 1 — @(x)). Here we en- 
counter the problem of large deviations (see Chap. 9). If 
Prt (x ) 
—— > 1 
1— d(x) 


(20.2.11) 


as n — oo for all 


x </2InInB, (1 —«) (20.2.12) 


and some e > 0 then the proof of the law of the iterated logarithm for the Wiener 
process given in Sect. 19.3 can easily be extended to the sums S;,/B, (to estimate 
P(S;,/Bn > x) one has to use the Kolmogorov inequality; see Corollary 11.2.1). 
One way to establish (20.2.11) and (20.2.12) is to use estimates for the rate 
of convergence in the central limit theorem. This approach was employed in the 
proof of Theorem 20.2.1, where we used Theorem 20.1.3. However, to ensure that 


572 20 Functional Limit Theorems 


(20.2.11) and (20.2.12) hold one can use weaker assertions than Theorem 20.1.3. To 
some extent, this fact is illustrated by the following assertion (see [32]): 


Theorem 20.2.2 If B, — oo and By+1/Bn > 1 asn— o, and 


o(> < *) — P(x) 


for some 5 > 0 and c < ©, then the law of the iterated logarithm holds. 


<c(nB,y ** 


sup 
x 


If && £ € are identically distributed then Theorem 20.2.1 implies that the law of 
the iterated logarithm is valid whenever E|é|? exists. In fact, however, for identically 
distributed &;, the law of the iterated logarithm always holds in the case of a finite 
second moment, without any additional conditions. 


Theorem 20.2.3 (Hartman—Wintner, [32]) If the & are identically distributed, 
Eé& = 0, and Eé? = 1, then (20.2.1) and (20.2.2) hold with B2 replaced with n. 
Every point from the segment [—1, 1] is a limiting one for the sequence 


ae n>1 
J2niniInn 


The last assertion of the theorem means that, for each t € [—1, 1] and any e > 0, 
the interval (t — ¢, t + €) contains, with probability 1, infinitely many elements of 
the sequence 


Sn 
J2nIninn | 


20.3 Convergence to the Poisson Process 


20.3.1 Convergence of the Processes of Cumulative Sums 


The theorems of Sects. 20.1 and 20.2 show that the Wiener process describes rather 
well the evolution of the cumulative sums when summing “conventional” random 
variables &; , satisfying the Lyapunov condition. It turns out that the Poisson process 
describes in a similar way the evolution of the cumulative sums when the random 
variables &%_, correspond to the occurrence of rare events. 

As in Sect. 5.4, first we will not consider the triangular array scheme, but obtain 
precise inequalities describing the proximity of the processes under study. Consider 
independent random variables é),..., &, with Bernoulli distributions: 


n 


P&=D=pr, PE=O=1—m, Doman. 
k=1 


20.3 Convergence to the Poisson Process 573 


We will assume that p := maxg<, px is small and the number yz is “comparable 
with 1”. Put 


k 
Pk 
qo :=0, dk = a Qn = 5 qj, k=O, 
j=0 


and form a random function s,(t) on [0, 1] in the following way. Put s,,(0) := 0, 


k 
Sp(t) = Se= > 8; for t € (Ox_1, Ox], K=1,...,n. 


j=l 


Here it is more convenient to use a step function rather than a continuous trajectory 
Sn(t) (cf. Remark 20.1.2). The assertions to be obtained in this section are similar to 
the invariance principle and state that the process s,(t) converges in a certain sense 
to the Poisson process &(t) with intensity w on [0, 1]. This convergence could of 
course be treated as weak convergence of distributions in the metric space D(0, 1). 
But in the framework of the present book, it is apparently inexpedient for at least 
two reasons: 

1. To do that, we would have to introduce a metric in D(O, 1) and study its prop- 
erties, which is somewhat complicated by itself. 

2. The trajectories of the processes s,(t) and &(t) are of a simple form, and 
characterising their closeness can be done in a simpler and more precise way without 
using more general concepts. Indeed, as we saw, the trajectory of &(t) on [0, 1] is 
completely determined by the collection of random variables (7 (1); T1,..., Tay), 
where T; is the epoch of the k-th jump of the process, 74; — 7 €T,. A similar 
characterisation is valid for the trajectories of s,(t): they are determined by the 
vector (Sy (1), 41, ..-,95,(1)), Where 0 = O,,, V1, Y2,-.. are the values j for which 
€; = 1. We will say that the distributions of s,,(¢) and z(t) are close to each other if 
the distributions of the above vectors are close. This convention will correspond to 
convergence of the processes in a rather strong and natural sense. 

It is not hard to see from what we said before about the Poisson processes (see 
Sect. 19.4) that the introduced convergence of the distributions of the jump points of 
the process sy (t) is equivalent to convergence of the finite-dimensional distributions 
of s,(t) to those of z(t) (we know that the trajectories of s,(t) are step functions). 


Theorem 20.3.1 The processes s,(t) and m(t) can be constructed on a common 
probability space so that 


n 
P(sn(1) = (1); O% — Gy < Tk < OK, K=1,...,7())>1— Yep}. 
j=l 
(20.3.1) 


Since ya D; < yp, the smallness of p means that, with probability close to 1, 
the values of s,(1) and (1) coincide (cf. Theorem 5.4.2) and the positions of the 
respective points of jumps of the processes s,(t) and z(t) do not differ much. 


574 20 Functional Limit Theorems 


Put 7 = p/p and, for a fixed k > 1, denote by B“ the ¢-neighbourhood of the 
orthant set B:= {(x],..., Xx) : xj < vj, j < k} for some v; > 0. Theorem 20.3.1 
implies the following. 


Corollary 20.3.1 For anyk =1,...,n, 


P(sn(1) =k, 1, ..., 6%) € B) < P(w(1) =k, (Th, ..., Te) € B) + > 77; 
j=l 


n 
P(x(1) =k, (Th, .--, Te) € B) < P(sn(1) =k, (O1,..., 4) € BY) +) pF. 
j=l 
Proof Let A, denote the event appearing on the left-hand side of (20.3.1), 
Dn := {sn(1) =k, (61, ...,%) € B}, 
Cy = {x() =k, (T,-.., |) € B}. 
Then, by virtue of (20.3.1), 


n 
P(Dn) <P(Dn An) + >> 5 
j=l 


n 
<P(Dn, 7) =k, (Ti,..., Th) € B) + >> p} 
j=l 


n 
<P(x(1) =k, (T,..., Te) € B) + ae 
j=l 


The converse inequality is established similarly. The corollary is proved. 


Proof of Theorem 20.3.1 Let np := m(Qx) — m(Qx-1), kK =1,...,n. The theorem 
will be proved if we construct {&,} and {7;} on a common probability space so that 


r( Uti # n] <> pi. (20.3.2) 
k=1 j=l 


A construction leading to (20.3.2) has essentially already been used in Theo- 
rem 5.4.2. The required construction will be obtained if we consider independent 


random variables @1,...,@n; @, € Uo,1, and put 
ke: 0 ifa@, <1— pr, ; 0 if we <e Pk =: 719. k, 
F"|1 ifor>1— pe, (FE 1 if ox € [ty-1.4, 78), 


where 77 j,¢ = Ip, ([0, j)), j =0,1,.... Then m, € Mp,, e-1 m E My, 
{&% # me} = {ox € [1 — pe, e#) Ue + pee, 1]}. 


20.3 Convergence to the Poisson Process 575 


Therefore, 


P(E # mk) S Pi 
and we get (20.3.2). The theorem is proved. 


If we now consider the triangular array scheme & »,...,&:.n, for which 


P(E n =). = Pk,n> P(Ek.n =0)=1- Pk,n> 


n 


> Pk.n =! Un > BE, Pn = Max Pk,n > O 
kel k<n 


as n — oo, then Theorem 20.3.1 easily implies convergence of the finite-dimensional 
distributions of the processes S,(t) to m(t), where s,(t) is constructed as before 
and zr(t) is the Poisson process with parameter jz. Consider, for example, the two- 
dimensional distributions P(s,(t) > j, s,(1) =k) for t € (0,1), j <k. In the no- 
tation of Theorem 20.3.1 (to be precise, we have to add the subscript n where 
appropriate; e.g., the Poisson processes with parameters (4, and jz will be denoted 
by z,,(t) and z(t), respectively), we obtain 
P(sy(t) > J, wC)= k) = P(s, (1) =k, 0) < t). 


By Corollary 20.3.1 the right-hand side does not exceed 


n 


P(t (1) =k, Tj <t)+ ¥- PP ns 
l=1 


where, as is easy to see, 


Pim, 0) =k, 7; <1) = Pm =k a2 7) 
k 
= Yo P(t (0) =1)P(a,(1—1) =k —1) 
= 


+ P(x(1) =k, x) = j) 
as n — oo, so that 
P(s,(t) = J; si) = k) = P(x(t) = J, m(1) = k) +0(1). 


The converse inequality is established in a similar way (by using the convergence 
Jn — 0 as n — oo). The required convergence of the finite-dimensional distribu- 
tions is proved. 


20.3.2 Convergence of Sums of Thinning Renewal Processes 


The Poisson process can appear as a limiting one in a somewhat different set-up—as 
a limit for the sum of a large number of homogeneous “slow” renewal processes. 


576 20 Functional Limit Theorems 


We formulate the setting of the problem more precisely. Let n;(t),i = 1,2,...,n, 
be mutually independent arbitrary homogeneous renewal processes in the “triangu- 
lar array scheme” (i.e. they depend on 7) generated by sequences ee, for which 
(see Chap. 10; “ 2 t® for k > 2) 

t re 
Eni(t)=—, 4) := ain =Et > ow, > —>p 
i 2% 
for a fixed jz, and 
Fj) := P(r < t) <Sin 20 


and for any fixed tf as n — oo, where r;,, does not depend on ij. 


Theorem 20.3.2 Under the above conditions, the finite-dimensional distributions 
of the process 


in(t) = D> mi) 
i=l 


converge as n — © to those of the Poisson process 1(t) with the parameter ku: for 
anyl>1,0<kj<k<---<k, 


P(En(t1) = ki... Sn) = ki) > Piz) =i, --.. (hn) = hi). 
(On convergence to the Poisson process, see the remark preceding Theo- 
rem 20.3.1.) 


Proof First we will prove convergence of the distributions of the increments 


Ent +u) — Sn(u) 


to the Poisson distribution with parameter wt. Put A; := nj(t + u) — ni (wu), 
Di :=t/a;. We have (x; (u) is the excess for the process n;; see Sects. 10.2, 10.4) 


EA; = pi, 
P(A; > 1) < P(xi(w) <1) [P(ES”) <2] 


1 ft 1 _ t _ zs 
<—f P(E > 2) dz: Fit)! < Lea)! = prt). 
aj 0 aj 


I-1 


This implies that 
[ee] 
EA; = pi = 7 IP(Aj =!) = PO = 1) + 0(p), 
1 
P(A; =1)= pi +o(pi), — P(A; =0) = 1 — pi + oC). 


Therefore the conditions of Corollary 5.4.2 are met, which implies that 


(20.3.3) 


n 


in(t +u) —on(u) =D) A; & Wy (20.3.4) 
i=1 


20.3 Convergence to the Poisson Process S77 


It remains to prove the asymptotic independence of the increments. For simplic- 
ity’s sake, consider only two increments, on the intervals (u, 0) and (u,u +t), and 
assume that ¢,(u) = k. Moreover, suppose that the following event A occurred: the 
renewals occurred in the processes with numbers 71, ..., ig. It suffices to verify that, 
given this condition, (20.3.4) will still remain true. Let B be the event that there 
again were renewals on the interval (vu, uv + t) in the processes with the numbers 
ij,..., 7. Evidently, 


k 
P(B| A) < SO P(t <t+u) <krigun > 0. 
l=1 


Thus the contribution of the processes n;,,/ = 1,...,k, to the sum (20.3.4) given 
condition A is negligibly small. Consider the remaining n — k processes. For them, 


P(xi 0) € Uy u+t)) 


P(A; >1|A)= 


P(xi (0) > uv) 
1 u+t 1 Uu -1 
=— (1- Fa) dal 1 = f (1- Fi) az| 
qi Ju ai JO 
= pj +o(pj). (20.3.5) 
Since relation (20.3.3) remains true for conditional distributions of A; (given A 
and fori £ij,/=1,...,k), we obtain, similarly to the above argument (using now 


instead of the equality )°°°,/P(A; = 1) = p; the relation )°°°, /P(A; =1| A) = 
Di + 0(p;) which follows from (20.3.5)) that 


P(A; = 1| A) = pj + O(pi), P(A; = 0| A) =1— p; + 0(pj). 


It remains to once again make use of Corollary 5.4.2. 


Chapter 21 
Markov Processes 


Abstract This chapter presents the fundamentals of the theory of general Markov 
processes in continuous time. Section 21.1 contains the definitions and a discus- 
sion of the Markov property and transition functions, and derives the Chapman— 
Kolmogorov equation. Section 21.2 studies Markov processes in countable state 
spaces, deriving systems of backward and forward differential equations for tran- 
sition probabilities. It also establishes the ergodic theorem and contains examples 
illustrating the presented theory. Section 21.3 deals with continuous time branch- 
ing processes. Then the elements of the general theory of semi-Markov processes 
are presented in Sect. 21.4, including the ergodic theorem and some other related 
results for such processes. Section 21.5 discusses the so-called regenerative pro- 
cesses, establishing their ergodicity and the Laws of Large Numbers and Central 
Limit Theorem for integrals of functions of their trajectories. Section 21.6 is devoted 
to diffusion processes. It begins with the classical definition of diffusion, derives the 
forward and backward Kolmogorov equations for the transition probability function 
of a diffusion process, and gives a couple of examples of using the equations to 
compute important characteristics of the respective processes. 


21.1 Definitions and General Properties 


Markov processes in discrete time (Markov chains) were considered in Chap. 13. 
Recall that their main property was independence of the “future” of the process of 
its “past” given its “present” is fixed. The same principle underlies the definition of 
Markov processes in the general case. 


21.1.1 Definition and Basic Properties 


Let (2, §, P) be a probability space and {&(t) = &(t, w), t > 0} a random process 
given on it. Set 


$1 :=0 (EW); u <2), Bit,c0) = 0 (E(u); u >t), 


A.A. Borovkov, Probability Theory, Universitext, 579 
DOI 10.1007/978-1-4471-5201-9_21, © Springer-Verlag London 2013 


580 21 Markov Processes 


so that the variable (u) is §;-measurable for u < t and §{;,90)-measurable for u > ft. 
The o-algebra o ($1, S{r,00)) is generated by the variables €(u) for all uw and may 
coincide with ¥ in the case of the sample probability space. 


Definition 21.1.1 We say that &(t) is a Markov process if, for any t, A € §;, and 
B € 8{t,00), we have 

P(AB|E(t)) = P(A|E(t)) P(BIE(O). (21.1.1) 
This expresses precisely the fact that the future is independent of the past when the 
present is fixed (conditional independence of §; and Sjr,o0) given &(t)). 


We will now show that the above definition is equivalent to the following. 


Definition 21.1.2 We say that &(t) is a Markov process if, for any bounded §{,00)- 
measurable random variable 7, 


E(|$:) = E(n|&(0). (21.1.2) 


It suffices to take n to be functions of the form n = f (&(s)) for s >t. 


Proof of the equivalence Let (21.1.1) hold. By the monotone convergence theorem 
it suffices to prove (21.1.2) for simple functions 7. To this end it suffices, in turn, 
to prove (21.1.2) for 7 = Ip, the indicator of the set B € §{r,00). Let A € §;. Then, 
by (21.1.1), 


P(AB) = EP(AB|&(t)) = E[P(A]E(1)) P(BlE@) | 
= EE[I4P(B|é(1)) (E()] = E[IAP(B|E@))]. (21.1.3) 
On the other hand, 
P(AB) = Eflalg] = E[LaP(B|3;)]. (21.1.4) 


Because (21.1.3) and (21.1.4) hold for any A € %;, this means that P(B|§;) = 
P(BE(t)). 
Conversely, let (21.1.2) hold. Then, for A € §; and B € §{1,00), we have 
P(ABIé(t)) = E[ECalal8,)|€()] = ELLE a3, [€(O] 
= E[I4E(Iz|&()) |E()] = P(BlE)P(AEO). 


It remains to verify that it suffices to take n = f (&(s)), s > t, in (21.1.2). In order 
to do this, we need one more equivalent definition of a Markov process. 


Definition 21.1.3. We say that &(t) is a Markov process if, for any bounded function 
f and any t) <tg <---<t% <t, 


E(f(E(1) |G), ---, &n)) =E(F(EO|EG)))- (21.1.5) 


21.1 Definitions and General Properties 581 


Proof of the equivalence Relation (21.1.5) follows in an obvious way from (21.1.2). 
Now assume that (21.1.5) holds. Then, for any A € o (&(t)),..., &(t)), 


E(f(&()); A) =E[E(f(€)|E(m)); A]. (21.1.6) 


Both parts of (21.1.6) are measures coinciding on the algebra of cylinder sets. There- 
fore, by the theorem on uniqueness of extension of a measure, they coincide on the 
o-algebra generated by these sets, i.e. on §;,. In other words, (21.1.6) holds for any 
A € §;,, which is equivalent to the equality 


E[f(E)|8:,.] =ELS (EM) |) 
for any t, <t. Relation (21.1.2) for 7 = f (&(£)) is proved. 


We now prove that in (21.1.2) it suffices to take n = f(&(s)), 5 >t. Lett <uj < 
+++ < Uy. We prove that then (21.1.2) is true for 


n=| | A(w)). (21.1.7) 


i=l 
We will make use of induction and assume that equality (21.1.2) holds for the 
functions 


n—l 
v=|[A#(E@d) 
i=l 


(for n = 1 relation (21.1.2) is true). Then, putting g(up;_1) := Elf, (E (un) |ECn_v I], 
we obtain 
E(n|$1) = E[E(n|u,_.)| 8+] = EL yE( fn (En) | Fun) |e] 
= E[yE(fn (&(un)) |&(Wn-1)) Ka = El yg(&(un—1)) Kae 
By the induction hypothesis this implies that 
E(nl8:) = Elyg(&(un—-1)) |] 
and, therefore, that E(7|7§;) is o (€(t))-measurable and 
E(n|&()) = E(E(/8,)/&) = EMIS). 


We proved that (21.1.2) holds for o (§(uj), ..., &(un))-measurable functions of 
the form (21.1.7). By passing to the limit we establish first that (21.1.2) holds for 
simple functions, and then that it holds for any §{;,.0)-measurable functions. 


21.1.2 Transition Probability 


We saw that, for a Markov process &(t), the conditional probability 


P(E(r) € B|Ss) =P(E() € BlE(s))  forr>s 


582 21 Markov Processes 


is a Borel function of &(s) which we will denote by 
P(s, &(s); t, B) = P(E(t) € BI&(s)). 


One can say that P(s, x; t, B) as a function of B and x is the conditional distribution 
(see Sect. 4.9) of €(t) given that &(s) = x. By the Markov property, it satisfies the 
relation (s <u <f) 


P(s,Xx; 1 B)= f Pos.ximdy) Pn y: t, B), (21.1.8) 
which follows from the equality 


P(E(t) € BIE(s) = x) 
=E[P(é(1) € B/S.) |E(s) =x] =E[P(u, &(w); t, B)|E(s) =x]. 


Equation (21.1.8) is called the Chapman—Kolmogorov equation. 

The function P(s, x; t, B) can be used in an analytic definition of a Markov pro- 
cess. First we need to clarify what properties a function P, (s, t) should possess in 
order that there exists a Markov process &(t) for which 


Py,B(S,t) = P(s,x;t, B). 
Let (X, 8) be a measurable space. 
Definition 21.1.4 A function P,.p(s,t) is said to be a transition function on 
(X, Bx) if it satisfies the following conditions: 


(1) Asa function of B, P;,g(s,t) is a probability distribution for each s < ft, x € X. 
(2) Px,p(s,t) is measurable in x for each s <t and Be Bx. 
(3) ForO<s <u <t andall x and B, 


Aste / PGP aout) 


(the Chapman—Kolmogorov equation). 
(4) Py Bs, t) = Ip(x) fors =t. 


Here properties (1) and (2) ensure that P;,g(s, t) can be a conditional distribution 
(cf. Sect. 4.9). 

Now define, with the help of P;,z(s, ft), the finite-dimensional distributions of a 
process &(f) with the initial condition (0) = a by the formula 


P(E(t) Edy, ...,€(tn) € dyn) 
= Fa,dy, (0, ty) Py, dy (t1, t2) ies Py -,dyn (tn—1, tn). (21.1.9) 


By virtue of properties (3) and (4), these distributions are consistent and therefore 
by the Kolmogorov theorem define a process &(t) in (R?, mB), where T = [0, 00). 


21.2 Markov Processes with Countable State Spaces. Examples 583 


By formula (21.1.9) and rule (21.1.5), 


P(E(tr) € Bn|(E(t1),---, €(n—1)) = (1, ---s Yn—1)) 
= Py, 1,By(tn—1, tn) = P(E (tr) € Bn|&Gn—1) = Yn—1) 
= P(th-1, Yn—13 tn, Bn). 


We could also verify this equality in a more formal way using the fact that the 
integrals of both sides over the set {&(t,) € Bi,...,&({a-1) € Bn—1} coincide. 

Thus, by virtue of Definition 21.1.3, we have constructed a Markov process & (f) 
for which 


P(s, x; t, B) = Py Bs, t). 
This function will also be called the transition function (or transition probability) of 
the process &(t). 
Definition 21.1.5 A Markov process & (t) is said to be homogeneous if P(s, x; t, B), 
as a function of s and t, depends on the difference t — s only: 
P(s,x;t,B)=P(t—s;x, B). 


This is the probability of transition during a time interval of length t — s from x 
to B. If 


Ptu;t, B)= / plu; t, y) dy 
B 
then the function p(u; x, y) is said to be a transition density. 


It is not hard to see that the Wiener and Poisson processes are both homogeneous 
Markov processes. For example, for the Wiener process, 


1 
Ptu;x,y)= Toa e/a 
V2ru 


21.2 Markov Processes with Countable State Spaces. Examples 


21.2.1 Basic Properties of the Process 


Assume without loss of generality that the “discrete state space” X coincides with 
the set of integers {0, 1, 2,...}. For simplicity’s sake we will only consider homo- 
geneous Markov processes. 

The transition function of such a process is determined by the collection of 
functions P(t; i, 7) = pij(t) which form a stochastic matrix P(t) = || pi; (t)|| (with 
pij(t) = 0, > j Pij (t) = 1). Chapman—Kolmogorov’s equation now takes the form 


pijt+s)= Y= Pik (t) paj(s), 
k 


584 21 Markov Processes 


or, which is the same, in the matrix form, 
P(t+s)= P(t)P(s)= P(s)P(t). (21.2.1) 


In what follows, we consider only stochastically continuous processes for which 


&(t+s) x &(t) as s — O, which is equivalent in the case under consideration to 
each of the following three relations: 


P(E¢+5) #E)) > 0, P(t+s)— P(t), P(s) > PO)=E (21.2.2) 


as s — 0 (component-wise; E is the unit matrix). 

We will also assume that convergence in (21.2.2) is uniform (for a finite X this is 
always the case). 

According to the separability requirement, we will assume that &(t) cannot 
change its state in “zero time” more than once (thus excluding the effects illus- 
trated in Example 18.1.1, i.e. assuming that if &(t) = j then, with probability 1, 
&(t +s) = j for s €[0,t), t =t(w) > 0). In that case, the trajectories of the pro- 
cesses will be piece-wise constant (right-continuous for definiteness), i.e. the time 
axis 1s divided into half-intervals [0, t;), [t1, T1 +72), ..., on which &(f) is constant. 
Put 


qj(t) = P(E(u) = j, O<u < tO) = j) =P(1 = 0). 


Theorem 21.2.1 Under the above assumptions (stochastic continuity and separa- 
bility), 


wiser, 
where qj < ©; moreover, q; > 0 if pi (t) ¥ 1. There exist the limits 
1— piilt ij(t 
fe sy, Megs GK (21.2.3) 
t>0 t 1-0 =f . 


where Ae dij = i- 
Proof By the Markov property, 


gi(t +s) =qi(t)qi(s), 
and q;(t) |. Therefore there exists a unique solution q;(t) = e~%' of this equation, 
where qj < 00, since P(t; > 0) = | and gq; > 0, because q;(t) < 1 when p;j(t) ¥ 1. 
Let further 0 < tg < t, --- < t <t. Since the events 
{e@)=tiore <n tGiw=—j}, r=0..n—b 7Fi 
are disjoint, 
n-1 
pilt)=qilt)+ >) Yo gilty) pif (ri — tr) Pyilt —t41). (21.2.4) 
r=0 j:jAi 
Here, by condition (21.2.2), pji(t — tr+1) < & for all 7 Ai, and ¢; > Oast > 0, 
so that the sum in (21.2.4) does not exceed 


21.2 Markov Processes with Countable State Spaces. Examples 585 


n—1 n 
6 YS Gilt) ij Gr =i =ar( User ie =%) <6(1—qi(t)), 


r=0 j:jFi r=1 
pit) <qi@) + &:(1— q(t). 
Together with the obvious inequality p;;(t) > q;(t) this gives 
1—qi(t)>1- pu) = (1-—qi®)d + &) 


(i.e. the asymptotic behaviour of 1 — g;(t) and 1 — p;;(t) as tf > oo is identical). 
This implies the second assertion of the theorem (i.e., the first relation in (21.2.3)). 
Now let ¢, := rt/n. Consider the transition probabilities 


n—1 
pi) = S- Gi ltr) pij (t/n)qj(t — t+) 
r=0 
n—1 5a 
> (1 — &) pij(t/n) ees > (1 e:) pi (yn) 


This implies that 


1—e 4! ij (6 
pif(t) = = «(= } limsup PHO” 
qi 60 
and that the upper limit on the right-hand side is bounded. Passing to the limit as 
t — 0, we obtain 
ij (t 

lim inf Pi) > lim sup 

t>0 t 60 


pij (6) 


Since paar pij(t) = 1 — pii(t), we have are dij = qi. The theorem is 
proved. 


The theorem shows that the quantities 
qij pe pe 
py=—, jH#i, pui=0 
qi 


form a stochastic matrix and give the probabilities of transition from i to j during 
an infinitesimal time interval A given the process &(f) left the state i during that 
time interval: 


_ _ \— PMA), Gi 
P(E(E + A) = j[E) =i, 6 + A) Fi) = — ne a 
as A> 0. 

Thus the evolution of &(t) can be thought of as follows. If (0) = Xo, then &(r) 
stays at Xo for a random time Tt; € Vox, Then &(t) passes to a state X1 with prob- 
ability px,x,. Further, (tf) = X over the time interval [t), t) + 12), 12 € Pox, ; 
after which the system changes its state to X2 and so on. It is clear that Xo, X1,... 
is a homogeneous Markov chain with the transition matrix || p;;||. Therefore the 


586 21 Markov Processes 


further study of €(t) can be reduced in many aspects to that of the Markov chain 
{Xn; n = O}, which was carried out in detail in Chap. 13. 

We see that the evolution of §(¢) is completely specified by the quantities q;; and 
qi forming the matrix 


PO- PO (31255) 
: ; 2: 


Q = ligijll = lim 
where we put qj; := —qj;, so that ar gij = 9. We can also justify this claim using 
an analytical approach. To simplify the technical side of the exposition, we will 
assume, where it is needed, that the entries of the matrix Q are bounded and con- 
vergence in (21.2.3) is uniform in i. 

Denote by e4 the matrix-valued function 


1 
A k 
eo=E+ y al 
k=1 


Theorem 21.2.2 The transition probabilities p;;(t) satisfy the systems of differen- 
tial equations 


P'(t) = P@)Q, (21.2.6) 
P’(t)= OP(t). (21.2.7) 
Each of the systems (21.2.6) and (21.2.7) has a unique solution 
P(t) =e2. 


It is clear that the solution can be obtained immediately by formally integrating 
equation (21.2.6). 


Proof By virtue of (21.2.1), (21.2.2) and (21.2.5), 
P(t — P(t P(s)-—E 

pict) PE ape 2 Se. (21.2.8) 
s—0 S s—0 S 
In the same way we obtain, from the equality 

P(t +s) — P(t)=(P(s) — E) P(t), 

the second equation in (21.2.7). The passage to the limit is justified by the assump- 
tions we made. 


Further, it follows from (21.2.6) that the function P(f) is infinitely differentiable, 
and 


P@) = P#)Q*, 
9° tk oo ok rk 
P(t)- PO=) | POOT =) = 
k=1 k=1 
P(t) = P(O)e2. 


The theorem is proved. 


21.2 Markov Processes with Countable State Spaces. Examples 587 


Because of the derivation method, (21.2.6) is called the backward Kolmogorov 
equation, and (21.2.7) is known as the forward Kolmogorov equation (the time in- 
crement is taken after or before the basic time interval). 

The difference between these equations becomes even more graphical in the case 
of inhomogeneous Markov processes, when the transition probabilities 


P(E(t) = j|&(s) =i) = pij(s,0), 8 St, 
depend on two time arguments: s and tf. In that case, (21.2.1) becomes the equality 


P(s,t+u) = P(s,t)P(t,t +), and the backward and forward equations have the 
form 


dP(s,t) dP(s,t) 
= P(s,t)Q(s), = Q(t) P(s,t), 
Os ot 
respectively, where 
P(t —E 
Q(t) = lim PG t+) —E 
u—>0 u 


The reader can derive these relations independently. 

What are the general conditions for existence of a stationary limiting distribu- 
tion? We can use here an approach similar to that employed in Chap. 13. 

Let & ®t) be a process with the initial value € (0) =i and right-continuous 
trajectories. For a given ig, put 


pO := min{t > 0:€® (1) = io} =! V0, 
ve r= min{t > y-1+1:EP =i}, k=1,2,.... 


Here in the second formula we consider the values t > vz_1 + 1, since for t > vg_] 
we would have vg = vg_1. Clearly, P(v, — ve_1 = 1) > 0, and P(r, — ve_1 € 
(t,t +h)) > 0 for any t > 1 and h > 0 provided that pji,(t) ¥ 1. 

Note also that the variables 1%, k = 0,1,..., are not defined for all elementary 
outcomes. We put vo = 00 if € ®) (t) ¢ io for all ¢ > 0. A similar convention is used 
for v;, k > 1. The following ergodic theorem holds. 


Theorem 21.2.3 Let there exist a state ig such that Ev, < 00 and P(v <oo) =1 
for alli € Xo C X. Then there exist the limits 


lim pij(t) = pj (21.2.9) 
t> co 


which are independent of i € Xo. 


Proof As was the case for Markov chains, the epochs v1, v2, ... divide the time axis 
into independent cycles of the same nature, each of them being completed when 
the system returns for the first time (after one time unit) to the state ig. Consider 
the renewal process generated by the sums 1%, k = 0, 1,..., of independent random 
variables vo, vy — Ve_-1, K=1,2,.... Let 


n(t) := min{k : vg > t}, y(t) :=t— Vaq)y-1, H(t) = D> P(x <t). 
k=0 


588 21 Markov Processes 


The event Ag, := {y(t) € [v, v+ dv)} can be represented as the intersection of the 
events 


Bav:=(J{m er v—dv,t v}} Si-v 


k>0 
and Cy := {€(u) ¥ io for u € [tf —u + 1, t]} € Fr—v,00). We have 


t ; t ; 
pij(t) = [ P(E) = j, YO € lv, v +dv)) = [ P(§ = j, BayCv) 
t . 
7 i E[Iy,, P(E) = J, Co] B—»)] 


t . 
= [Btn PEM =1.CrlEC- 


On the set Bg,, one has &(t — v) = ig, and hence the probability inside the last 
integral is equal to 


P(E (v) = j, E(u) Fi for u €[1, v]) =: g(v) 


and is independent of t and 7. Since P(By,) = dH (t — v), one has 


t 


t 
pjo= f (oP (Ban) = [ g(v)dH(t — v). 


By the key renewal theorem, as t — ov, this integral converges to 


[ee 


— dv. 
En Jy g(v) du 


The existence of the last integral follows from the inequality g(v) < P(v > v). The 
theorem is proved. 


Theorem 21.2.4 [f the stationary distribution 
P= lim P(t) 
100 


exists with all the rows of the matrix P being identical, then it is a unique solution 
of the equation 


PO=0. (21.2.10) 


It is evident that Eq. (21.2.10) is obtained by setting P’(t) = 0 in (21.2.6). Equa- 
tion (21.2.7) gives the trivial equality QP = 0. 


Proof Equation (21.2.10) is obtained by passing to the limit in (21.2.8) first as 
t — oc and then as s — 0. Now assume that P; is a solution of (21.2.10), Le. 
P,Q =0. Then P; P(t) = P; for t < 1, since 

CO 


ok rk 
P\(P(t) — PO) =P; )> ——= 
k=1 


=0. 
k! 


21.2 Markov Processes with Countable State Spaces. Examples 589 


Further, P; = P, P*(t) = P; P(kt), P(kt) > P as k — oo, and hence P; = 
P,P = P. The theorem is proved. 


Now consider a Markov chain {X,,} in discrete time with transition probabil- 
ities pij = gij/qi. i A j, pii = 0. Suppose that this chain is ergodic (see Theo- 
rem 13.4.1). Then its stationary probabilities {7 ;} satisfy Eqs. (13.4.2). Now note 
that Eq. (21.2.10) can be written in the form 


Pi 4] =) Peak Pkj 
k 
which has an obvious solution p; = cz; /qj;, c = const. Therefore, if 
qty 
y= <oo (12:19) 


then there exists a solution to (21.2.10) given by 


-1 
et [ y=). . 21.2.12 
SG 0» qj ) 
In Sects. 21.4 and 21.5 we will derive the ergodic theorem for processes of a more 
general form than the one in the present section. That theorem will imply, in partic- 
ular, that ergodicity of {X,,} and convergence (21.2.11) imply (21.2.9). Recall that, 
for ergodicity of {X;}, it suffices, in turn, that Eqs. (13.4.2) have a solution {z/;}. 
Thus the existence of solution (21.2.12) implies the ergodicity of &(t). 


21.2.2 Examples 


Example 21.2.1 The Poisson process &(t) with parameter 1 is a Markov process for 
which gj =A, gi,i41 =A, and pj,i41 = 1,1 =1,0,.... For this process, the station- 
ary distribution p = (po, pi, .-.) does not exist (each trajectory goes to infinity). 


Example 21.2.2 Birth-and-death processes. These are processes for which, for 
i>, 
A;A+O0(A) forj=it+1, 
pij(A)= MiA+o(A) forj=i-1, 


o(A) for |j —i| > 2, 
so that 
ice ear for j =i+1, 
i= ; — 
veer for j=i-1 


are probabilities of birth and death, respectively, of a particle in a certain population 
given that the population consisted of i particles and changed its composition. For 


590 21 Markov Processes 


i =0 one should put j1p := 0. Establishing conditions for the existence of a station- 
ary regime is a rather difficult problem (related mainly to finding conditions under 
which the trajectory escapes to infinity). If the stationary regime exists, then accord- 
ing to Theorem 21.2.4 the stationary probabilities p; can be uniquely determined 
from the recursive relations (see Eq. (21.2.10), in our case g;; = —qj = —(Ai + ;)) 


—podo + pili = 9, 
Podg — pi(Ai + 41) + p2p2 = 0, 
pieedechanatnadedassterdeluect dee (21.2.13) 


and condition }° p; = 1. 


Example 21.2.3 The telephone lines problem from queueing theory. Suppose we 
are given a system consisting of infinitely many communication channels which 
are used for telephone conversations. The probability that, for a busy channel, the 
transmitted conversation terminates during a small time interval (f, f + A) is equal 
to AA + o(A). The probability that a request for a new conversation (a new call) 
arrives during the same time interval is 4A + 0(A). Thus the “arrival flow” of calls 
is nothing else but the Poisson process with parameter 4, and the number &(t) of 
busy channels at time ¢ is the value of the birth-and-death process for which A; = A 
and uj = ip. 

In that case, it is not hard to verify with the help of Theorem 21.2.3 that there 
always exists a stationary limiting distribution, for which Eqs. (21.2.13) have the 
form 


Apo = HP1, 
os) lee 
a 
2 k 
p= pow pr= (=) » ede eS (=) = (21.2.15) 


so that po = e~*/“, and the limiting distribution will be the Poisson law with pa- 
rameter A/jL. 

If the number of channels n is finite, the calls which find all the lines busy will 
be rejected, and in (21.2.13) one has to put A,» = 0, Pn4t = Pn42 =--: = 0. In 
that case, the last equation in (21.2.14) will have the form unpy = App). Since 
the formulas (21.2.15) will remain true for k <n, we obtain the so-called Erlang 
formulas for the stationary distribution: 


n=()3[E40)] 


(the truncated Poisson distribution). 


21.3 Branching Processes 591 


The next example will be considered in a separate section. 


21.3 Branching Processes 


The essence of the mathematical model describing a branching process remains 
roughly the same as in Sect. 7.7.2. A continuous time branching process can be 
defined as follows. Let &@ )(t) denote the number of particles at time ¢ with the 
initial condition €“) (0) =i. Each particle, independently of all others, splits during 
the time interval (tf, t+ A) with probability 4A + 0(A) into a random number n ¥ 1 
of particles (if 7 = 0, we say that the particle dies). Thus, 


EO) EM M4... +4, (21.3.1) 
where gs Die) are independent and distributed as ED gt), Moreover, 
pij(A) =ipAhji41+0(A), ji; hy=P(V=k); h, =0; 
pii(A) =1—ipA+o(A), (21.3.2) 


so that here qj; =ijhj—i+1, qii = —ip. 

By formula (21.3.2), iA is the principal part of the probability that at least 
one particle will split. Clearly, the state 0 is absorbing. It will not be absorbing any 
more if one considers processes with immigration when a Poisson process (with 
intensity 4) of “outside” particles is added to the process €“ (tr). Then 


pij(A) =iwAhj—i4; +0(A) for 7-1 40,1, 
Piit1(A) = ACihg +4) + 0(A). 
We return to the branching process (21.3.1), (21.3.2). By (21.3.1) we have 


CO 
: (i) () i i 
OG, ESO = [EF OP =r'(t.2) = Dc pik, 


k=0 
where 
lo.) 
r(t,z) = EEO =e pu). (21.3.3) 
k=0 
Equation (21.2.7) implies 
[oe] 
Dit) = > qupre(t). 
1=0 


Therefore, differentiating (21.3.3) with a to t, we find that 


ry (t,Z) = ye Dig (t) = +. omer 


k=0 1=0 


= a yt pul) = ree: (21.3.4) 
I=0  -k=0 1=0 


592 21 Markov Processes 


Fig. 21.1 The form of the 

plot of the function f;. The 

smaller root of the equation 

fi(q) = gives the f;(q) 

probability of the eventual 

extinction of the branching 

process 0| q 1 


But qi: = wp; for! ~ 1, qi. = —p, and putting 


f(s) <a = el (Es” — 5) = W( Son -) 


1=0 


we can write (21.3.4) in the form 


ritay=f(rG;2)): 


We have obtained a differential equation for r = r(t,z) (equivalent to (21.3.2)) 
which is more convenient to write in the form 


dr r(t,z) dy r(t,z) dy 
= dt, t= / — = / —. 
f(r) 10,2) SO) Jz fy) 


Consider the behaviour of the function f;(y) = Ey” — y on [0, 1]. Clearly, 
fi (0) = P(n = 9), fi (1) = 9, and 
f()=En—-1, — f()=En@—Dy™? > 0. 
Consequently, the function /;(y) is convex and has no zeros in (0, 1) if En < 1. 
When En > 1, there exists a point g € (0, 1) such that f;(¢g) = 0, fi@ < 0 (see 


Fig. 21.1), and fi(y) =(y —@) f{(@) + O((y — q)*) in the vicinity of this point. 
Thus if En > 1, z <q andr *¢ q, then, by virtue of the representation 


1 I 
fig) O-OA@ 


+ O(1), 


we obtain 


" dy 1 r—q 
t= =— In + O(1). 
z fY) HwA@ \e-@ 
This implies that, as t > oo, 
Ha —G= Gage ON aH Gagne, 
r(,2gy=qtO(e"), a=—pf{(q) > 0. 
In particular, the extinction probability 
Piet) =r(t,0) =4 + O(e™) 


converges exponentially fast to g, pi90(0©) = q. Comparing our results with those 
from Sect. 7.7, the reader can see that the extinction probability for a discrete time 


(21.3.5) 


21.4 Semi-Markov Processes 593 


branching process had the same value (we could also come to this conclusion di- 
rectly). Since pxo(t) = [pro(t)I*, one has pxo(oo) = g 

It follows from (21.3.5) that the remaining “probability mass” of the distribution 
of &(t) quickly moves to infinity as t > oo. 

If En < 1, the above argument remains valid with g replaced with 1, so that the 
extinction probability is pi9(co) = pro(oo) = 1. 


If En = 1, then 
AQ) =—<— v Y pray O((y—-1)°), 
r dy v3 1 dates 2 
.; ah see 
70) ufi@ rl 7 ~ tf) 


Thus the extinction probability r(t, 0) = pio(t) also tends to 1| in this case. 


21.4 Semi-Markov Processes 


21.4.1 Semi-Markov Processes on the States of a Chain 


Semi-Markov processes can be described as follows. Let an aperiodic discrete time 
irreducible Markov chain {X,} with the state space X = {0, 1, 2,...} be given. To 
each state i we put into correspondence the distribution F;(t) of a positive random 
variable ¢“: 


FQ) =P(e <2). 


Consider independent of the chain {X,} and of each other the sequences ae 


ome eae a= i ¢), of independent random variables with the distribution F;. Let, 


moreover, ite distribution of the initial random vector (XQ, fo), Xo € X, 6p = 0, be 
given. The evolution of the semi-Markov process &(u) is described as follows: 


E(u)=Xo forO0<u <p, 


Eu) =X, forfo<u<o+¢,*”, 
E(u) = X_ forty +c? <u < my +c? 4+, (21.4.1) 
E(u) =Xy for Zy-1 <u < Zn, Zn =t+e*r se ee 

and so on. Thus, upon entering state X, = j, the trajectory of (uw) remains in that 


state for a random time ¢, (Xn) = ) then switches to state X,4, and so on. It 


is evident that such a process is, generally speaking, not Markovian. It will be a 
Markov process only if 


1-F(t)=e%', gi >O, 


and will then coincide with the process described in Sect. 21.2. 


594 21 Markov Processes 


| gs 6) 


(x) 
or 


Zy Z, Z, 


Fig. 21.2 The trajectories of the semi-Markov process &(t) and of the residual sojourn time pro- 
cess x(t) 


If the distribution F; is not exponential, then, given the value &(t) =, the time 
between ¢ and the next jump epoch will depend on the epoch of the preceding jump 
of &(-), because 
1— Fi(v+u) 

1— Fj(v) 
for non-exponential F; depends on v. It is this property that means that the process 
is non-Markovian, for fixing the “present” (i.e. the value of €(t)) does not make the 
“future” of the process €(u) independent of the “past” (i.e. of the trajectory of &(u) 
for u <f). 

The process &(t) can be “complemented” to a Markov one by adding to it the 
component x (t) of which the value gives the time u for which the trajectory (t+), 
u > 0, will remain in the current state &(t). In other words, x(t) is the excess of 
level ¢ for the random walk Zo, Z1,... (see Fig. 21.2): 


P(6 > vtule > v) = 


XQ=Zvyai—t, v(t) =max{k: Ze <1). 


The process x(t) is Markovian and has “‘saw-like” trajectories deterministic in- 
side the intervals (Z;, Zz41). The process X(t) = (&(t), x (t)) 1s obviously Marko- 
vian, since the value of X (t) uniquely determines the law of evolution of the process 
X(t +u) for u > 0 whatever the “history” X(v), v < t, is. Similarly, we could con- 
sider the Markov process Y (t) = (&(¢), y(t)), where y (f) is the defect of level t for 
the walk Zo, Z1,...: 


yt)=t—Zyq. 


21.4.2 The Ergodic Theorem 


In the sequel, we will distinguish between the following two cases. 


21.4 Semi-Markov Processes 595 


(A) The arithmetic case when the possible values of cl )i=0,1,..., are mul- 
tiples of a certain value h which can be assumed without loss of generality to be 
equal to 1. In that case we will also assume that the g.c.d. of the possible values of 
the sums of the variables ¢“) is also equal to h = 1. This is clearly equivalent to 
assuming that the g.c.d. of the possible values of recurrence times 0“ of &(t) to the 
state i is equal to | for any fixed 7. 

(NA) The non-arithmetic case, when condition (A) does not hold. 

Put aj := Ec. 


Theorem 21.4.1 Let the Markov chain {X,} be ergodic (satisfy the conditions of 
Theorem 13.4.1) and {1 ;} be the stationary distribution of that chain. Then, in the 
non-arithmetic case (NA), for any initial distribution (€ 9, Xo) there exists the limit 


lim P(E(t) =i, x(t)> v) eee [re > u) du. (21.4.2) 
ae Lema; Jv 
In the arithmetic case (A), (21.4.2) holds for integer-valued v (the integral be- 
comes a sum in that case). It follows from (21.4.2) that the following limit exists 
Jim P(E) =i) = 


Ij Qj 
ja; 


Proof For definiteness we restrict ourselves to the non-arithmetic case (NA). In 
Sect. 13.4 we considered the times t between consecutive visits of {X,,} to state i. 
These times could be called “embedded”, as well as the chain {X,,} itself in regard 
to the process &(t). Along with the times t), we will need the “real” times 6“ 
between the visits of the process &(t) to the state 7. Let, for instance, X; = 1. Then 


(X1) (X2) 


d 
0 ee) to pe bE), 


where t = t“!). For definiteness and to reduce notation, we fix for the moment the 
value i = 1 and put 6“) =: 6. Let first 


mse, Xo=l. (21.4.3) 


Then the whole trajectory of the process X(t) for t > 0 will be divided into iden- 
tically distributed independent cycles by the epochs when the process hits the state 
&(t) = 1. We denote the lengths of these cycles by 6), 02...; they are independent 
and identically distributed. We show that 


1 
Eé = — is 21.4.4 
= Dam alta) 


Denote by 6(n) the “real” time spent on n transitions of the governing 
chain {X,,}. Then 
Op +++ +Ony-1 SOM) SO +--+ + Onn), (21.4.5) 


where n(n) := min{k : Ty > n}, Tk = ee Tj, Tj are independent and distributed 
as T. We prove that, as n > oo, 


E@(n) ~ nz E0. (21.4.6) 


596 21 Markov Processes 


By Wald’s identity and (21.4.5), 
E6(n) < EOEn(n), (21.4.7) 
where En(n) ~ n/Et =nz}. 
Now we bound from below the expectation E@(n). Put m := [nz — en], On := 
y=1 9). Then 
E6(n) > E(6(n); n(n) > m) 
> E(On:; n(n) > m) = mE6 — E(On:; n(n) < m). (21.4.8) 
Here the random variable ©,,/m > 0 possesses the properties 
On/m> EO asm—>oo,  E(O,/m)=E#. 


Therefore it satisfies the conditions of part 4 of Lemma 6.1.1 and is uniformly in- 
tegrable. This, in turn, by Lemma 6.1.2 and convergence P(yn(n) < m) — O means 
that the last term on the right-hand side of (21.4.8) is o(m). By virtue of (21.4.8), 
since € > 0 is arbitrary, we obtain that 


liminfn~!E@(n) > 2 E06. 
n—>oo 


This together with (21.4.7) proves (21.4.6). 
Now we will calculate the value of E@(n) using another approach. The variable 
@(n) admits the representation 


O(n) = DD ca Pas ca) 


J 
where N(j, 1) is the number of visits of the trajectory of {X;} to the state 7 during 
the first n steps. Since toa ye and N(j, 7) are independent for each j, we have 


n 
E6(n) =) ajEN(j,n), — EN(j,n) = >> pij(h). 
j k=1 

Because p1;(k) — mj; as k + oo, one has 

lim n~'EN(j,n) =7;. 

n—->oo 
Moreover, 

mj = > mpij(k) = m1 prj) 
and, therefore, 
pijk) < aj /m. 

Hence 


n'EN(j,n) <xj/m, 


21.4 Semi-Markov Processes 597 


and in the case when za ajIj < Oo, the series pay ajn ~'EN(j,1) converges uni- 
formly inn. Consequently, the following limit exati 


jim n 'E0(n) =) ajmj. 
j 
Comparing this with (21.4.6) we obtain (21.4.4). If E@ = oo then clearly 
E6(n) = 00 and )) , aj; = 00, and vice versa, if )) ; aj; = 00 then E6 = oo. 
Consider now the random walk {@;}. To the k-th cycle there correspond 7; tran- 
sitions. Therefore, by the total probability formula, 


CO rt 
P(E() =1, x) > 0) = | P(O, Edu, Cf, >t-uty), 
k=1 


where a7 is independent of ©, and distributed as ¢“) (see Lemma 11.2.1 or 
the strong Markov property). Therefore, denoting by Hp(u) := S72, P(Ox < u) 
the renewal function for the sequence {©;}, we obtain for the non-arithmetic case 
(NA), by virtue of the renewal theorem (see Theorem 10.4.1 and (10.4.2)), that, as 
t> oo, 


P(E() = 1, x(0) >») 
t 
=f dHo(u) P(¢ >t—u+v) 
0 


_ af Pe” >utr)do= gf Pe >u) du (21.4.9) 
EO Jo E6 Jy 


We have proved assertion (21.4.2) for i = | and initial conditions (21.4.3). The 
transition to arbitrary initial conditions is quite obvious and is done in exactly the 
same way as in the proof of the ergodic theorems of Chap. 13. 

If )> aj; = 00 then, as we have already observed, E@ = oo and, by the renewal 
theorem and (21.4.9), one has P(&(t) = 1, x(t) > v) ~ 0 as t > o. It remains to 
note that instead of i = | we can fix any other value of i. The theorem is proved. 


In the same way we could also prove that 


Sa ajTj man ae 


Tj me ; 
lim P(é(t) =i, x(t) >u, y(t) >v) = : P(c > y) dy 
too ( ) Yajnj ae ( y) 


dim P(E(t) =i, Y®) >) = 


(see Theorem 10.4.3). 


21.4.3 Semi-Markov Processes on Chain Transitions 


Along with the semi-Markov processes &(t) described at the beginning of the 
present section, one sometimes considers semi-Markov processes “given on the 


598 21 Markov Processes 


transitions” of the chain {X;,}. In that case, the distributions F;; of random variables 
¢“)) > 0 are given and, similarly to (21.4.1), for the initial condition (Xo, X1, Co) 
one puts 


E(u) := (Xo, X1) forO<u<% 


E(u) = (X1,X2) forgo <u < tp +o o*? (21.4.10) 
E(u) = (Xo, X3) for fo + EEXO*Y <u < go tor 4 ee, 


and so on. Although at first glance this is a very general model, it can be com- 
pletely reduced to the semi-Markov processes (21.4.1). To that end, one has to notice 
that the “two-dimensional” sequence Y, = (Xy, Xn41), 2 =0,1,..., also forms a 
Markov chain. Its transition probabilities have the form 


py fork=j, 

0 fork 4 j, 
Pijngn@® = Pje™ pa forn > 1, 

so that if the chain {X,,} is ergodic, then {Y,,} is also ergodic and 


Pj (kl) = 


Piijy kt) (1) > Tr Pxd- 


This enables one to restate Theorem 21.4.1 easily for the semi-Markov pro- 
cesses (21.4.10) given on the transitions of the Markov chain {X,,}, since the process 
(21.4.10) will be an ordinary semi-Markov process given on the chain {Y,}. 


Corollary 21.4.1 [f the chain {X,} is ergodic then, in the non-arithmetic case, 
lim P(E(t) = Gi, j), x() > v) 
too 


T; pts Co oe 
=f PE ud, ay = Bo”. 
oe) kIT Pk v 


In the arithmetic case v must be a multiple of the lattice span. 


We will make one more remark which could be helpful when studying semi- 
Markov processes and which concerns the so-called semi-Markov renewal functions 
Hj;(t). Denote by 7;;(n) the epoch (in the “real time”) of the n-th jump of the 
process &(t) from state i to 7. Put 


Cc 
Hjj(t) =) P(Tij(n) <2). 
n=1 
If v;;(¢) is the number of jumps from state i to j during the time interval [0, r), 
then clearly Hj;(t) = Evj;;(¢). 
Set Af (t):= f+ A) — f(t), A> 0. 


Corollary 21.4.2 In the non-arithmetic case, 
1; Dij A 
Lam 


In the arithmetic case v must be a multiple of the lattice span. 


lim AH;j;(t) = (21.4.11) 
tc 


21.4 Semi-Markov Processes 599 


Proof Denote by yu) the number of transitions of the process &(t) from i to j 
during the time interval (0, u) given the initial condition (k, 0). Then, by the total 
probability formula, 


EAv; (1) =f rr E(t) =k, x(t) €du)E yA —u). 
Since ve (u) < yu), by Theorem 21.4.1 one has 


00 
hyj(A) := lim EAvij(t) = aa Dm [ P(¢ > u)Ev(A —u)du. 
(21.4.12) 
Further, 
P(¢ <A-u) < F(A) > 0 

as A — 0, and 

P(A —u)=s)<(pijFi(A))’, ki, 

PUP(A-uw=s+1)<(pjF(A))’, 521, 

P(v{) (A —u) = 1) = pij + 0( F(A). 
It follows from the aforesaid that 

Evy (A —u) =0(F;(A)), Ey (A —u) = pij + 0(Fi(A)). 

Therefore, 


7 PijA 


Yar 


hij(A) = +0(A). (21.4.13) 


Further, from the equality 
Ayj(t + 2A) — Ajj (t) = AAjj(t) + AAjj(t + A) 


we obtain that hj; (2A) = 2h;;(A), which means that h;;(A) is linear. Together with 
(21.4.13) this proves (21.4.11). The corollary is proved. 


The class of processes for which one can prove ergodicity using the same meth- 
ods as the one used for semi-Markov processes and also in Chap. 13, can be some- 
what extended. For this broader class of processes we will prove in the next section 
the ergodic theorem, and also the laws of large numbers and the central limit theo- 
rem for integrals of such processes. 


600 21 Markov Processes 


21.5 Regenerative Processes 


21.5.1 Regenerative Processes. The Ergodic Theorem 


Let X(t) and Xo(t); t > 0, be processes given in the space D(O, co) of functions 
without discontinuities of the second type (the state space of these processes could 
be any metric space, not necessarily the real line). The process X(t) is said to be 
regenerative if it possesses the following properties: 

(1) There exists a state x9 which is visited by the process X with probability 1. 
After each such visit, the evolution of the process starts anew as if it were the original 
process X(t) starting at the state X (0) = xo. We will denote this new process by 
Xo(t) where Xo(0) = xo. To state this property more precisely, we introduce the 
time To of the first visit to xo by X: 


0) := inf {rt >0: X(t) = ag}: 


However, it is not clear from this definition whether to is a random variable. For 
definiteness, assume that the process X is such that for t 9 one has 


{t >h=|J (\ {|X (a) — x0] >1/n}, 


n theS 


where S is a countable set everywhere dense in [0, f]. In that case the set {to > rt} 
is clearly an event and T9 is a random variable. The above stated property means 
that to is a proper random variable: P(t 9 < oo) = 1, and that the distribution of 
X(t +u), u => 0, coincides with that of Xo(u), u > 0, whatever the “history” of the 
process X(t), t < T]. 

(2) The recurrence time t of the state xo has finite expectation Et < o, 
T :=inf{t : Xo(t) = xo}. 

The aforesaid means that the evolution of the process is split into independent 
identically distributed cycles by its visits to the state x9. The visit times to xo are 
called regeneration times. The behaviour of the process inside the cycles may be 
arbitrary, and no further conditions, including Markovity, are imposed. 

We introduce the so-called “taboo probability” 


P(t, B):=P(Xo(t) € B, t> 1). 
We will assume that, as a function of t, P(t, B) is measurable and Riemann inte- 


grable. 


Theorem 21.5.1 Let X(t) be a regenerative process and the random variable t be 
non-lattice. Then, for any Borel set B, as t > ©, 


P(X(t) € B) > 1(B) = = [ Pu, B) du. 


If t is a lattice variable (which is the case for processes X(t) in discrete time), the 
assertion holds true with the following obvious changes: t — © along the lattice 
and the integral is replaced with a sum. 


21.5 Regenerative Processes 601 


Proof Let To :=0, T, := 7, +---+ tT, be the epoch of the k-th regeneration of the 
process Xo(t), and 


H(u):= SY P(t <u) 


k=0 


(Tr £ ¢ are independent). Then, using the total probability formula and the key 
renewal theorem, we obtain, as t > oo, 


oo t 
P(Xo(t) € B) = >| P(T € du) P(t —u, B) 
k=0 


t 1 oe) 
=| dH) Pa—u,B) + = f P(u, B)du=ax(B). 
0 Ez Jo 
For the process X (t) one gets 


t 
P(X(t) € B) = i P(t) € du)P(Xo(t — u) € B) > m(B). 
0 


The theorem is proved. 


21.5.2 The Laws of Large Numbers and Central Limit Theorem 
for Integrals of Regenerative Processes 


Consider a measurable mapping f : X — R of the state space X of a process X (t) 
to the real line R. As in Sect. 21.4.2, for the sake of simplicity, we can assume that 
X = R and the trajectories of X(t) lie in the space D(O0, oo) of functions without 
discontinuities of the second kind. In this case the paths f(X(u)), u => 0, will be 
measurable functions, for which the integral 


t 
S(t) = / f(X(u)) du 
0 
is well defined. For such integrals we have the following law of large numbers. Set 
Tt 
‘a =| f (Xo(u)) du, a:=Et. 
0 
Theorem 21.5.2 Let the conditions of Theorem 21.5.1 be satisfied and there exist 
ag := E¢. Then, as t > ov, 


S(t) p a¢ 
+ al 


For conditions of existence of E¢, see Theorem 21.5.4 below. 


602 21 Markov Processes 


Proof The proof of the theorem largely repeats that of the similar assertion (The- 
orem 13.8.1) for sums of random variables defined on a Markov chain. Divide the 
domain u > 0 into half-intervals 


(0,7), (%e-1,T%], &21, To=1, 


where 7; are the epochs of hitting the state x9 by the process X(t), tT; = Tk — Tk-1 
for k > | are independent and distributed as t. Then the random variables 


Tk 
n= f f(X(u))du, k>1 


Tk-1 
are independent, distributed as ¢, and have finite expectation az. The integral S(t) 


can be represented as 


v(t) 
SQ) =zot > oe+%, 


k=1 
where 
To t 
v(t) := max{k : Ty < t}, Z0 =f f(X(@)) du, Zt = f (X(u)) du. 
0 Ty) 


Since To is a proper random variable, zo is a proper random variable as well, and 
as. 
hence zo/t —> 0 as t > oo. Further, 


a fv 
Zt af f (Xo(u)) du, 


where y(t) = t — T,) has a proper limiting distribution as t + oo (see Chap. 10), 
So Z;/t 4, 0ast > 00. The sum Swit) = ya ¢; is nothing else but the generalised 
renewal process studied in Chaps. 10 and 11. By Theorem 11.5.2, as t > ow, 


S, a 
Be 
t a 


The theorem is proved. 


In order to prove the strong law of large numbers we need a somewhat more 
restrictive condition than that in Theorem 21.5.2. Put 


a =f | f (Xo(u))| du. 


Theorem 21.5.3 Let the conditions of Theorem 21.5.1 be satisfied and Ef* < 00. 
Then 


S(t) as. ag 
—  , 
t a 


21.6 Diffusion Processes 603 


The proof essentially repeats (as was the case for Theorem 21.5.2) that of the law 
of large numbers for sums of random variables defined on a Markov chain (see 
Theorem 13.8.3). One only needs to use, instead of (13.8.18), the relation 


[ f (X(v)) dv 
T 


k 


sup 
Ty <usTk+1 


Tr+1 
277 =|[ | f(X(v))| dv 


k 


and the fact that E ¢;* < oo. The theorem is proved. 


Here an analogue of Theorem 13.8.2, in which the conditions of existence of 
E¢* and E¢é are elucidated, is the following. 


Theorem 21.5.4 (Generalisation of Wald’s identity) Let the conditions of Theo- 
rem 21.5.1 be met and there exist 


E| f(X(co))| = [[reo|nan 


where X (oo) is a random variable with the stationary distribution x. Then there 
exist 


E¢* = ErE| f (X(co)) 


, E¢ =ErEf(X(oo)). 


The proof of Theorem 21.5.4 repeats, with obvious changes, that of The- 
orem 13.8.2. 


Theorem 21.5.5 (The central limit theorem) Let the conditions of Theorem 21.5.1 
be met and Et? < 00, Et” < 00. Then 


S(t) —rt 
d./t/a 


where r = az /a, d*= D(¢ —rrt). 


S01, to, 


The proof, as in the case of Theorems 21.5.2—21.5.4, repeats, up to evident 
changes, that of Theorem 13.8.4. 


Here an analogue of Theorem 13.8.5 (on the conditions of existence of variance 
and on an identity for a~!d?) looks more complicated than under the conditions of 
Sect. 13.8 and is omitted. 


21.6 Diffusion Processes 


Now we will consider an important class of Markov processes with continuous tra- 
jectories. 


Definition 21.6.1 A homogeneous Markov process &(t) with state space (R, 38) 
and the transition function P(t, x, B) is said to be a diffusion process if, for some 
finite functions a(x) and b?(x) > 0, 


604 21 Markov Processes 


(1) limy0 4 f(y — x) P(A, x, dy) = a(x), 
(2) limy+0 4 f(y — x)’ P(A, x, dy) = b?(x), 
(3) for some 6 > 0 and c < 00, 


7 ly —x/?t* P(A, x, dy) een 


Put A&(t) := E(t + A) — &(t). Then the above conditions can be written in the 
form: 


E[AE(1)|E@) =x] ~ aa) A, 
E[(4€())|E@ =2] ~P°@)A, 
E[|AE@/ E) =x] < cal? as A> 0. 
The coefficients a(x) and b(x) are called the shift and diffusion coefficients, re- 


spectively. Condition (3) is an analogue of the Lyapunov condition. It could be re- 
placed with a Lindeberg type condition: 


(3a) E[(Aé(t))?; |AE(t)| > €] = 0(A) for anye >O0as A> 0. 


It follows immediately from condition (3) and the Kolmogorov theorem that a 
diffusion process &(t) can be thought of as a process with continuous trajectories. 
The standard Wiener process w(t) is a diffusion process, since in that case 


1 i jd 
—(@-y)/(2t) g 
e y, 
V20t JB 


EAw(t)=0, E[Aw(n]?=4, — E[Aw(n)]* = 3.7. 


Therefore the Wiener process has zero shift and a constant diffusion coefficient. 
Clearly, the process w(t) + at will have shift a and the same diffusion coefficient. 

We saw in Sect. 21.2 that the “local” characteristic Q of a Markov process &(f) 
with a discrete state space X specifies uniquely the evolution law of the process. 
A similar situation takes place for diffusion processes: the distribution of the process 
is determined uniquely by the coefficients a(x) and b(x). The way to establishing 
this fact again lies via the Chapman—Kolmogorov equation. 


P(t;x,B)= 


Theorem 21.6.1 /f the transition probability P(t; x, B) of a diffusion process is 
twice continuously differentiable with respect to x, then P(t; x, B) is differentiable 
with respect to t and satisfies the equation 
OP _ oP b? a°P 
=a 
ot Ox 2 dx? 


(21.6.1) 


with the initial condition 
P(O;x, B) =Ip(x). (21.6.2) 
Remark 21.6.1 The conditions of the theorem on smoothness of the transition func- 


tion P can actually be proved under the assumption that a and b are continuous, 
b> bo > 0, |a| <c(|x| + 1) and b? < c(\x| + 1). 


21.6 Diffusion Processes 605 


Proof of Theorem 21.6.1 For brevity’s sake denote by P , and P.” the partial 
derivatives es oP and _ ae , respectively, and make use si ee relation 


389 
= (yar + SO pry 
yy € (x, y). (21.6.3) 


Then by the Chapman—Kolmogorov equation 


23 2 
2 = [PYG yx, B) — Ps x, B)], 


P(t+A;x, B)— Paix B)= f Pix.dy [Per y, B) — P(t; x, B)| 


=a(x)P! ee A+o(A)+R, (21.6.4) 


where 


r= [Oo [PU (ts yy, BY — PY (t; x, B)] P(A: x dy) = [ +f 
lyoxise  Jly-xise 


The first integral, by virtue of the continuity of P/’, does not exceed 


2 
5(c (5 @ a +012) 


where 6(€) — 0 as ¢ — 0; the second integral is o(A) by condition (3a). Since ¢ is 
arbitrary, one has R = 0(A) and it follows from the above that 


P(t+A;x,B)— P(t; x,B b? 
Pi ee gig 
A>0 A 2 


This proves (21.6.1). The theorem is proved. 


It is known from the theory of differential equations that, under wide assumptions 
about the coefficients a and b and for B = (—ow, z), the Cauchy problem (21.6.1)— 
(21.6.2) has a unique solution P which is infinitely many times differentiable with 
respect to t, x and z. From this it follows that P(t; x, B) has a density p(t; x, z) 
which is the fundamental solution of (21.6.1). 

It is also not difficult to derive from Theorem 21.6.1 that, along with P(t; x, B), 
the function 


u(t,x) = : g(<) P(t; x, dz) =E[e(E()] 


will also satisfy Eq. (21.6.1) for any smooth function g with a compact support, 
&)(t) being the diffusion process with the initial value €@) (0) = x. 

In the proof of Theorem 21.6.1 we considered (see (21.6.4)) the time increment 
A preceding the main time interval. In this connection Eqs. (21.6.1) are called back- 
ward Kolmogorov equations. Forward equations can be derived in a similar way. 


606 21 Markov Processes 


Theorem 21.6.2 (Forward Kolmogorov equations) Let the transition density 
D(t; x, y) be such that the derivatives 


a a? 
py lAo)P& x.y] and —[b*(y) pt: x. y)] 


exist and are continuous. Then p(t, x, y) satisfies the equation 


2 


a 1 oO 
+ g[4O)p@; x, »] - 7 py2 lOO vee x,y)]=0. (21.6.5) 


Op 


Dp := — 
pe a 


Proof Let g(y) be a smooth function with a bounded support, 


u(t, x) :=Eg(é (1) = i g(y) p(x: t, y) dy. 
Then 
u(t + A,x)—u(t, x) 
= f eix.2| pa: z, y)g(y) dy - [ 4.2.20 49| dz. (21.6.6) 


Expanding the difference g(y) — g(z) into a series, we obtain in the same way as in 
the proof of Theorem 21.4.1 that, by virtue of properties (1)—(3), the expression in 
the brackets is 


Jaws’ 2 “@) "@|a+ 014), 


This implies that there exists the derivative 


w= f paix 2] acoe' @ac+t? ©, "| dz. 


Integrating by parts we get 


0 0 1 a 
= [{-zle@ve: xD] +551 @pteix, alfa dz=0 


or, which is the same, 


/ Dp(t; x, z2g(z)dz=0. 


Since g is arbitrary, (21.6.5) follows. The theorem is proved. 


As in the case of discrete X, the difference between the forward and backward 
Kolmogorov equations becomes more graphical for non-homogeneous diffusion 
processes, when the transition probabilities P(s, x; t, B) depend on two time vari- 
ables, while a and b in conditions (1)—(3) are functions of s and x. Then the back- 
ward Kolmogorov equation (for densities) will relate the derivatives of the transition 
densities p(s, x; t, y) with respect to the first two variables, while the forward equa- 
tion will hold for the derivatives with respect to the last two variables. 


21.6 Diffusion Processes 607 


We return to homogeneous diffusion processes. One can study conditions ensur- 
ing the existence of the limiting stationary distribution of €)(t) as t > 00 which 
is independent of x using the same approach as in Sect. 21.2. Theorem 21.2.3 will 
remain valid (one simply has to replace ig in it with xo, in agreement with the no- 
tation of the present section). The proof of Theorem 21.2.3 also remains valid, but 
will need a somewhat more precise argument (in the new situation, on the event Bg, 
one has &(t — v) € dxo instead of &(t — v) = x0). 

If the stationary distribution density 


jim ps x,y) = pO) (21.6.7) 


exists, how could one find it? Since the dependence of p(t; x, y) of t and x van- 
ishes as t — ov, the backward Kolmogorov equations turn into the identity 0 = 0 as 
t —> oo. Turning to the forward equations and passing in (21.6.6) to the limit first as 
t — oo and then as A — 0, we come, using the same argument as in the proof of 
Theorem 21.2.3, to the following conclusion. 


Corollary 21.6.1 If (21.6.7) and the conditions of Theorem 21.6.2 hold, then the 
Stationary density p(y) satisfies the equation 


1 
[ao pol't 516°) =0 
na. ; : a 
(which is obtained from (21.6.5) if we put a = 0). 
Example 21.6.1 The Ornstein—Uhlenbeck process 
; 1— —2at 
EO) = xe + ra 


2a 


where w(u) is the standard Wiener process, is a homogeneous diffusion process 
with the transition density 

P ' 1 | (y _ xe)? 
:x,y= ex 
- ? J2n0(t) P 207(t) 


2 
oe a 
| o7(t)= iG ‘— 1), 
(21.6.8) 


We leave it to the reader to verify that this process has coefficients a(x) = ax, 
b(x) = o = const, and that function (21.6.8) satisfies the forward and backward 
equations. For a < 0, there exists a stationary process (the definition is given in the 


next chapter) 
—2at 
E() = aettw(§ ), 
2a 


of which the density (which does not depend on f£) is equal to 


2 


Oo 


608 21 Markov Processes 


In conclusion of this section we will consider the problem, important for various 
applications, of finding the probability that the trajectory of a diffusion process will 
not leave a given strip. For simplicity’s sake we confine ourselves to considering 
this problem for the Wiener process. Let c > 0 and d <0. 

Put 


U(t; x, B):= P(w™ (w) € (d,c) for all u € (0, 1]; w(t) € B) 
es P(sup wu) <¢, infwwW) >d, we B). 
us 


u<t 


Leaving out the verification of the fact that the function U is twice continuously 
differentiable, we will only prove the following proposition. 


Theorem 21.6.3 The function U satisfies Eq. (21.6.1) with the initial condition 
U(0; x, B) =Ip(x) (21.6.9) 
and boundary conditions 


U(t:c, B) =U(t:d, B) =0. (21.6.10) 


Proof First of all note that the function U(¢; x, B) for x € (d, c) satisfies conditions 
(1)-(3) imposed on the transition function P(t; x, B). Indeed, consider, for instance, 
property (1). 

We have to verify that 


c 
[nu aixdy) = dats) + 0(4) (21.6.11) 
d 
(with a(x) = 0 in our case). But U(t, x, B) = P(t; x, B) — V(t; x, B), where 
: = (x) : (x) (x) 
Vix, B)=P({ supe (u) > cor infw (u) sa} n{w (1) €B}), 


and 
| (DVO ay 


< max(c, —a)|P( sup w (u) > c) ES P( inf ww) < d)]. 
u<A usA 

The first probability in the brackets is given, as we know (see (20.2.1) and Theo- 

rem 19.2.2), by the value 


c—x 2 2 c-x 
2P(w?(A) > c) = 2P( wa) > ) ~ gtr, ge 
( ) JA Inz VA 
For any x <c and k > 0, it is o(A*). The same holds for the second probability. 
Therefore (21.6.11) is proved. In the same way one can verify properties (2) and (3). 
Further, because by the total probability formula, for x € (d, c), 


Cc 
U(t + A; x, B) -|/ U(A; x,dy)U(t; y, B), 
d 


21.6 Diffusion Processes 609 


using an expansion of the form (21.6.3) for the function U, we obtain in the same 
way as in (21.6.4) that 


UC + dix, B)— Ut x, B)= f U(A x. dy)[UG y,B) UC, BD] 


aU b?(x) 02U 
= A 
ag <5 a 


This implies that a exists and that Eq. (21.6.1) holds for the function U. 
That the boundary and initial conditions are met is obvious. The theorem is 
proved. 


A+o0(A). 


The reader can verify that the function 
0 
u(t; x, y):= a ee), y €(d,c), 
playing the role of the fundamental solution to the boundary problem (21.6.9)- 


(21.6.10) (the function wu satisfies (21.6.1) with the boundary conditions (21.6.10) 
and the initial conditions degenerating into the 6-function), is equal to 


wens = Ze] YS ep[- PAC OF 
= 2 p oy 
= [y — 2c — 2k(c — d)/* 
- Dew} FF 
k=0 


= [y— 2d —2kle— a)? 
a > exp| y : 
k=0 


This expression can also be obtained directly from probabilistic considerations (see, 
e.g., [32]). 


Chapter 22 
Processes with Finite Second Moments. 
Gaussian Processes 


Abstract The chapter is devoted to the classical “second-order theory” of time- 
homogeneous processes with finite second moments. Section 22.1 explores the re- 
lationships between the covariance function properties and those of the process itself 
and proves the ergodic theorem (in quadratic mean) for processes with covariance 
functions vanishing at the infinity. Section 22.2 is devoted to the special case of 
Gaussian processes, while Sect. 22.3 solves the best linear prediction problem. 


22.1 Processes with Finite Second Moments 


Let {&(f), —co < t < c&} be a random process for which there exist the moments 
a(t) = Eé(t) and R(t, uw) = E&(t)E(u). Since it is always possible to study the pro- 
cess &(t) — a(t) instead of &(t), we can assume without loss of generality that 
a(t) =0. 


Definition 22.1.1 The function R(t, u) is said to be the covariance function of the 
process &(f). 


Definition 22.1.2 A function R(t, u) is said to be nonnegative (positive) definite if, 
for any k; uy,...,Ux3@1,...,q 40, 


YaiajR(uj,uj)=0 (> 0). 
i,j 


It is evident that the covariance function R(t, uw) is nonnegative definite, because 
2 
YS aiaj Rui, “j)= B(D ayéu)) = 
i,j i,j 


Definition 22.1.3 A process &(t) is said to be unpredictable if no linear combination 


of the variables &(u1),...,& (ux) is zero with probability 1, i.e. if there exist no 
Uj,.--,Uk3 A],..-, Ax Such that 
P( Sag) -0) = 1. 
i 
A.A. Borovkov, Probability Theory, Universitext, 611 


DOI 10.1007/978-1-4471-5201-9_22, © Springer-Verlag London 2013 


612 22 Processes with Finite Second Moments. Gaussian Processes 


If R(t, u) is the covariance function of an unpredictable process, then R(t, u) 
is positive definite. We will see below that the converse assertion is also true in a 
certain sense. 

Unpredictability means that we cannot represent &(t,) as a linear combination of 
&(t)), j <k. 


Example 22.1.1 The process &(t) = yg &egx(t), where gx(t) are linearly inde- 
pendent and &; are independent, is not unpredictable, because from &(f1),...,&(tn) 
we can determine the values &(t) for all other ¢. 

Consider the Hilbert space Lz of all random variables n on (92, 5, P) having 
finite second moments, En = 0, endowed with the inner product (71, 72) = Eni n2 
corresponding to the distance ||y1 — n2|| = [E(m — 72)7]!/7. Convergence in L is 
obviously convergence in mean quadratic. 

A random process &(t) may be thought of as a curve in Lo. 


Definition 22.1.4 A random process &(t) is said to be wide sense stationary if the 
function R(t, u) =: R(t — u) depends on the difference t — u only. The function 
R(s) is called nonnegative (positive) definite if the function R(t, t +s) is of the re- 
spective type. For brevity, we will often call wide sense stationary processes simply 
stationary. 


For the Wiener process, R(t, vu) = Ew(t)w(u) = min(t, uv), so that w(t) cannot 
be stationary. But the process &(t) = w(t + 1) — w(f) will already be stationary. 

It is obvious that, for a stationary process, the function R(s) is even and Eé?(t) = 
R(O) = const. For simplicity’s sake, put R(O) = 1. Then, by the Cauchy—Bunja- 
kovsky inequality, 


|R@)| = |EE(EC +5)| < [EOE C+ 5] = RO =1. 
Theorem 22.1.1 


2 
(1) A process &(t) is continuous in mean quadratic (€(t + A) < &(t) as A> 0) 
if and only if the function R(u) is continuous at zero. 
(2) If the function R(u) is continuous at zero, then it is continuous everywhere. 


Proof 
(ly Jé@+ A) — (|? =E(E@ + A) —E~)? =2RO) — 2R(A). 
(2) R(t + A)— R(t) =E(E(t + A)E(O) — EEO) 
= (&(0), &(t + A) —&(1)) < JE@+ A)- EQ) 


= ,/2(R() — R(A)). (211) 


The theorem is proved. 


A process &(t) continuous in mean quadratic will be stochastically continuous, 
as we can see from Chaps. 6 and 18. The continuity in mean quadratic does not, 


22.1 Processes with Finite Second Moments 613 


however, imply path-wise continuity. The reader can verify this by considering the 
example of the process 


EH =nt+1)-—n@—-1, 


where 7(t) is the Poisson process with parameter |. For that process, the covariance 
function 


0 fort > 1, 
l1-t forO<t<1l 


R(t)= 


is continuous, although the trajectories of &(t) are not. If 
|R(A) — R(O)| <cAlt? (22.1.2) 


for some ¢ > 0 then, by the Kolmogorov theorem (see Theorem 18.2.1), &(t) has 
a continuous modification. From this it follows, in particular, that if R(t) is twice 
differentiable at the point t = 0, then the trajectories of €(t) may be assumed con- 
tinuous. Indeed, in that case, since R(t) is even, one has 


R'(0)=0 and R(A) — RO)~ 5 R"O)A?. 


As a whole, the smoother the covariance function is at zero, the smoother the 
trajectories of &(t) are. 

Assume that the trajectories of §(t) are measurable (for example, belong to the 
space D). 


Theorem 22.1.2 (The simplest ergodic theorem) /f 
R(s) > 0 ass>ou, (22.1.3) 
then 


T 
& =7/ Edt © 0. 
T Jo 


Proof Clearly, 


4 ] va T 
lérll? = all / RO —wdtdu. 
Since R(s) is even, 


T T T r 
y=} / R= u)dtdu=2 | i] R(t —u)dtdu. 
0 0 0 u 


Making the orthogonal change of variables v = (t — u)//2, s = (t +.u)/V2, we 


obtain 
T/V2 pT/J2 T 
Jz 2 f i R(vV2) duds < ar f R(v) dv, 
s=0 v=0 0 


2 2 a 
ler <7 / R(v) dv 0. 


The theorem is proved. 


614 22 Processes with Finite Second Moments. Gaussian Processes 


Example 22.1.2 The stationary white noise process &(t) is defined as a process 
with independent values, i.e. a process such that, for any f},...,f,, the variables 
&(t,),..., &(t,) are independent. For such a process, 


1 fort=0, 
Ro={ 9 for t £0, 


and thus condition (22.1.3) is met. However, one cannot apply Theorem 22.1.2 here, 
for the trajectories of &(t) will be non-measurable with probability | (for example, 
the set B = {t : E(t) > 0} is non-measurable with probability 1). 


Definition 22.1.5 A process &(t) is said to be strict sense stationary if, for any 
ti,...,¢%, the distribution of (€(f; + uw), &(t2 + u),...,&(% +u)) is independent 
of u. 


It is obvious that if &(f) is a strict sense stationary process then 
Bg (t)§(u) = BE (t — w)§(0) = R(t — wu), 


and &(t) will be wide sense stationary. The converse is, of course, not true. However, 
there exists a class of processes for which both concepts of stationarity coincide. 


22.2 Gaussian Processes 


Definition 22.2.1 A process &(f) is said to be Gaussian if its finite-dimensional 
distributions are normal. 


We again assume that Eé(t) = 0 and R(t, uv) = Eé(t)é(u). 
The finite-dimensional distributions are completely determined by the ch.f.s (A = 


(A1,..-5Ak),& = (E(t), ---,€(t&))) 
Fei?) — Re! Vy Ape) — ea Rat 
where R = ||R(¢;, t;)|| and the superscript T stands for transposition, so that 
ARAT = YAAPR(H, tj). 
ij 
Thus for a Gaussian process the finite-dimensional distributions are completely 
determined by the covariance function R(f, u). 


We saw that for an unpredictable process &(t), the function R(t, wu) is positive 
definite. A converse assertion may be stated in the following form. 


Theorem 22.2.1 [f the function R(t, u) is positive definite, then there exists an un- 
predictable Gaussian process with the covariance function R(t, u). 


22.2 Gaussian Processes 615 


Proof For arbitrary t),..., t;, define the finite-dimensional distribution of the vector 
&(t,),..., &(tg) via the density 


VIAI 1! 
Pty, ste X15 Pia Xk) = (arye/2 “XP —=xAxt ’ 


where A is the matrix inverse to the covariance matrix R = ||R(¢,t;)|| (see 
Sect. 7.6) and |A| is the determinant of A. These distributions will clearly 
be consistent, because the covariance matrices are consistent (the matrix for 
&(t)),..., &(th-1) is a submatrix of R). It remains to make use of the Kolmogorov 
theorem. The theorem is proved. 


Example 22.2.1 Let w(t) be the standard Wiener process. The process 
w(t) =w(t)—tw(1), t€[0, 1], 


is called the Brownian bridge (its “ends are fixed”: w°(0) = w°(1) = 0). The co- 
variance function of w®(t) is equal to 


R(t, u)= E(w(t) _ tw(1))(w(u) _ uw(1)) =t(l—-u) 


foru>t. 


A Gaussian wide sense stationary process &(f) is strict sense stationary. This 
immediately follows from the fact that for R(t, uw) = R(t —u) the finite-dimensional 
distributions of &(t) become invariant with respect to time shift: 


Pty,...,tk (x1, one Xk) = Ptj+u ances tpetu(X1, cee Xk) 


since ||R(4j +u, tj +u)|| =|RG, tl. 

If €(t) is a Gaussian process, then conditions ensuring the smoothness of its 
trajectories can be substantially relaxed in comparison with (22.1.2). 

Let for simplicity’s sake the Gaussian process &(t) be stationary. 


Theorem 22.2.2 [f, forh <1, 
| —a 
|R(h) — R(O)| < (tog i) , a>3,c<0, 
then the trajectories of €(t) can be assumed continuous. 


Proof We make use of Theorem 18.2.2 and put e(h) = (log a)? forl<B< 
(a — 1)/2 (we take logarithms to the base 2). Then 


(oe) 


y\e(27”) = wt <0, 


n=1 


and, by (22.1.1), 


616 22 Processes with Finite Second Moments. Gaussian Processes 


_ e(h) 
Pilg +m) ~ £00] > 609) =2)1 o( Tm) | 


fsa oon) -o-o( (oe) 


(22.2.1) 


Since the argument of ® increases unboundedly as h > 0, y =a — 28 > 1, and 
by (19.3.1) 


1—@(x)~ eo /2 


as x > OO, 
2x 


we see that the right-hand side of (22.2.1) does not exceed 


1 B-a/2 1 a—2p 
qth) := ci (oe ;) exp c9(loe 7) \ 


[o,@) [o,@) 
Y °2"q(27") =C| Sa exp{—con” + nIn2} <oo, 
n=1 


n=1 


so that 


because c2 > 0 and y > 1. The conditions of Theorem 18.2.2 are met, and so The- 
orem 22.2.2 is proved. 


22.3 Prediction Problem 


Suppose the distribution of a process &(f) is known, and one is given the trajectory of 
&(t) ona set B C (—oo, t], B being either an interval or a finite collection of points. 
What could be said about the value &(t + uw)? Our aim will be to find a random 
variable ¢, which is ¥g =o (&(v), v € B)-measurable (and called a prediction) and 
such that E(é(t + u) — € )* assumes the smallest possible value. The answer to that 
problem is actually known (see Sect. 4.8): 


6 =E(E(t +u)|5z). 
Let &(t) be a Gaussian process, B = {t,..., th}, t1 <tn<-++-<k <to=t+u, 
A= (07)! = |lajj|| and o? = \|EE (4;) € (¢;)|li,j=1,....k,0- Then the distribution of 
the vector (&(t), ..., &(to)) has the density 


vIAl 1 
f(%1,..-5Xk, x0) = (27 yk+D/2 exp 5 Me aij > 
ij 


and the conditional distribution of &(f9) given &(t1),...,&(t,) has density equal to 
the ratio 


SF (1,-. +, Xk, XO) 
= 
fad Cle ena XB, XO) amp 


22.3. Prediction Problem 617 


The exponential part of this ratio has the form 


= 
ex -y XOX j4j0 


This means that the conditional distribution under consideration is the normal 
law ®, 72, where 


XjAj0 
a=—)>> Lod ; gCv= —_. 
400 400 


Thus, in our case the best prediction ¢ is equal to 


p= 5 a, 


a 
ja 


The mean quadratic error of this prediction equals ./1/ago. 

We have obtained a linear prediction. In the general case, the linearity property 
is usually violated. 

Consider now the problem of the best linear prediction in the case of an arbitrary 
process &(t) with finite second moments. For simplicity’s sake we assume again that 
B={t,..., tr}. 

Denote by H(&) the subspace of Lz generated by the random variables &(t), 
—0oo <t < oo, and by Hp(é) the subspace of H(&) generated (or spanned by) 
E(t,),...,&(t,). Elements of Hg (&) have the form 


k 


Y > ajé(t)). 


j=l 
The existence and the form of the best linear prediction in this case are estab- 
lished by the following assertion. 
Theorem 22.3.1 There exists a unique point ¢ € Hp(é) (the projection of &(t + u) 
onto Hp(&), see Fig. 22.1) such that 
E(t+u)—¢ 1 Hp). (22.3.1) 
Relation (22.3.1) is equivalent to 


|E@+u)—¢| rece |e +u)—o]. (22.3.2) 


Explicit formulas for the coefficients a; in the representation ¢ =) aj&(t;) are 
given in the proof. 
Proof Relation (22.3.1) is equivalent to the equations 

(E@+u)—¢,€¢)))=0, j=Hl,...,k. 


618 22 Processes with Finite Second Moments. Gaussian Processes 


Fig. 22.1 Illustration to &(t+u) 
Theorem 22.3.1: the point ¢ 

is the projection of &(t + w) | 

onto Hg (é) 


Substituting here 


k 
c=) ak(t) € Hef), 


l=1 
we obtain 
k 
R(t +u, tj) =) a R(tj, 0), j=l,...,k, (22.3.3) 
l=1 


or, in vector form, R;+, = aR, where 
a=(dj,...,ak), 
Rigu = (RO +4,t),-..,R+u,%)), R=|RG,t)I- 


If the process &(t) is unpredictable, then the matrix R is non-degenerate and 
Eq. (22.3.3) has a unique solution: 


@=Ra,R". (22.3.4) 
If &(t) is not unpredictable, then either R~! still exists and then (22.3.4) holds, or 
R is degenerate. In that case, one has to choose from the collection &(f1), ..., &(t,) 


only / < k linearly independent elements for which all the above remains true after 
replacing k with /. 

The equivalence of (22.3.1) and (22.3.2) follows from the following considera- 
tions. Let 0 be any other element of Hg(&). Then 


ni=O0-CEHRE), nLlétt+u)—s, 
so that 
|é@ +u) —0| = é@+u)—C] + In = lE@+w—-<]. 


The theorem is proved. 


Remark 22.3.1 It can happen (in the case where the process &(t) is not unpre- 
dictable) that &(t + u) € Hg(&). Then the error of the prediction ¢ will be equal 
to zero. 


Appendix 1 
Extension of a Probability Measure 


In this appendix we will prove Carathéodory’s theorem, which was used in Sect. 2.1. 

Let A be an algebra of subsets of 2 on which a probability measure P, 1.e., a real- 
valued function satisfying conditions P1—P3 of Chap. 2, is given. Let P denote the 
class of all subsets of 2. For any A € P, there always exists a sequence {An }P° 
of disjoint sets from A such that (Mees An > A (it suffices to take Aj = §2 and 
An = @,n => 2). Denote by y(A) the class of all such sequences and introduce on 


P the real-valued function 


P*(A) := inf Y > P(An); {An} € ya]. 


n=1 


This function (the outer measure on P induced by the measure P on A) has the 
following properties: 


(1) P*(A) <P*(B) <1if ACB. 
(2) P*(Ueo, An) = 2, P(An) if the sets A, € A,n = 1,2,..., are disjoint. 
(3) P*(U%, An) < 022, P*(A,) for any Ay, Ad, ...€ P. 


Property (1) is obvious. Property (2) is established by the following argument. 
Let {B,} be any sequence from y(A), where A =(J°°, An. Since Upr_, An Bm = 
Ay € A, one has P(A,,) = aS P(A, Bm). Therefore, 


>> P(An) = a S > P(An Bm) = » Y > P(AnBn)- 
n=1 


nom m=1n=1 


But, for each N < ov, 


N 
Y > P(An Bn) < P(Bm). 


n=1 


A.A. Borovkov, Probability Theory, Universitext, 619 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


620 1 Extension of a Probability Measure 


Hence this equality holds for N = oo as well and, for any sequence {Bn} eEy(A), 


N 


DEP (An) < PB). 


n=1 


This implies that P*(A) > aa P(A,,). Because the converse inequality is obvious, 
we have P*(A) = °°", P(Az). 


Proof of property (3) Consider, for some ¢ > 0, sequences {Ankhpoy € y(A,) such 
that 


[o@) 


€ 
de PCAnk) SP*(An) + 5 
k=1 


The sequence of sets {Ank} clearly contains ) Ay and therefore 


[ee 


P*(LJ An) <)> YP Anw) < YO P*An) +6. 
nek 


n=1 


Since ¢ is arbitrary, property (3) is proved. 


Introduce now the binary operation of symmetric difference @ on arbitrary sets 
A and B from P by means of the equality 


A@® B:=ABUAB. 
It is not hard to see that 


A®B=B@A=AQBCAUB, A@BA=D 
A®QS=A, (A®BB)®C=A@(BOQC). 


With the help of this operation and the function P*, we introduce on P a distance p 
by putting, for any A, B EP, 


p(A, B):=P*(A@ B). 


This construction is quite similar to the one used in Sect. 3.4 (we considered there 
the distance d(A, B) = P(A @ B) between measurable sets A and B). The properties 
of the distance p are the same as in (3.4.2). We will need the following properties: 


(1) p(A, B) = p(B, A) = 0, p(A, A) =, 
(2) p(A, B) = p(A, B), 

(3) p(AB, CD) < p(A,C) + p(B, D), 
(4) p(U Ar, U Ba) < 4; p(Ak, Bx). 


We also note that 


1 Extension of a Probability Measure 621 


(5) |P*(A) — P*(B)| < e(A, B), and therefore P*(-) is a uniformly continuous 
function with respect to p. 


Properties (1)—-(3) were listed in (3.4.2); in the present context, they are proved in 
exactly the same way based on the properties of the measure P*. Property (4) follows 
from property (3) of the measure P* and the relation (we put here A = (J A, and 


B=UB,) 
A®BC|J(An © Bn), 


because 


s94-[(Us)o(rm}o[(r%)>(Us) 
e [U AnBa| U [U BA | = | J(AnBn U An Bn) =(_J(An © Bn). 


Property (5) follows from the fact that 
ACBU(A®@B), BCAU(AGB) (A1.1) 
and therefore 
P*(A) — P*(B) < P*(A@ B) = p(A, B), 
P*(B) —P*(A) < P*(A@ B) = p(A, B). 


Similarly to the terminology adopted in Sect. 3.4 we call a set A € P approximable 
if there exists a sequence A, € A for which p(A, A,) — 0. The totality of all ap- 
proximable sets we denote by 2l. This is clearly the closure of A with respect to p. 


Lemma A1.1 2 is a o-algebra. 


Proof We verify that 2 satisfies properties Al, A2’ and A3 of o-algebras of Chap. 2. 
Property Al: 2 € 2 is obvious, for A € 2. Property A3 (A € 2 if A € 20) follows 
from the fact that, for A € 2l, there exist A, € A such that, as n > oo, 


p(A, An) > 0, p(A, An) = p(A, An) > 0. 


Finally, consider property A2’. We show first that if A, € A, then A = An € 2. 
Indeed, we can assume without loss of generality that the A, are disjoint. Then, 
by virtue of the properties of the measure P*, for any ¢ > 0, 


>I P(AW) < PX(2) = 1, 


o( 4. Ui) =P" U a)= » P(Ax) <é 
k=1 


k=n+1 k=n+1 


for n large enough. 


622 1 Extension of a Probability Measure 


Now let A, € 21. We have to show that 


lo) 
A=(J4Aneu 


n=1 


Let {B,} be a sequence of sets from A such that p(An, By) < ¢/2”. Then one has 
B=\) By € 2 and, by property (4) of the distance p, 


p(A, B) <)) p(An, Bn) <e. 


n=1 


The lemma is proved. 


Now we can prove the main assertion.! 


Theorem Al.1 The probability P can be extended from the algebra A to some 
probability P. given on the o-algebra A. 
Proof For A € 2, put 

P(A) := P*(A). 


It is evident that P(A) = P(A) for A € A, and P(2) = 1. To verify that P is a 
probability we just have to prove the countable additivity of P. We first prove the 
finite additivity. It suffices to prove it for two sets: 


P*(AU B) = P*(A) + P*(B), (A1.2) 


where A, B € Aand AN B=. Let Ay € A and B, € A be such that p(A, A,) > 0 
and o(B, B,) > 0 as n — oo. Then 


|P*(AU B) — P*(A, U B,)| < p(AU B, An U Bn) < p(A, An) + p(B, Bn) > 0, 


P* (An U Bn) = P(An U Bn) = P(An) + P(Bn) — P(An Bn). (A1.3) 
Here 


P(A,) > P*(A), P(B,) > P*(B), 


'The theorem on the extension of a measure to the minimum o-algebra containing A was obtained 
by C. Carathéodory. The metrisation of normed Boolean algebras A by the distance p(A, B) = 
P(A @ B) was used by many authors (see, e.g., the talk by A.N. Kolmogorov at the 6th Polish 
Mathematical Congress in 1948 and Halmos [19]). 

It was L.Ya. Savel’ev who suggested the use of the continuity properties of the measure with 
respect to the distance p(A, B) = P*(A © B) in order to extend it. 


1 Extension of a Probability Measure 623 
P(AnB,) < P*(A,B) + P*(B,B) 
< P*(An A) + P*(ByB) < p(A, An) + p(B, Bn) > 0. 


Hence (A1.3) implies (A1.2). 
We now prove countable additivity. Let Ay € 2l be disjoint. Then, putting 


we obtain from the finite additivity of P that 
n [o,@) 
P(A) =D) P(Ay) + F( U a) 
k=1 k=n+1 
Therefore 


P(A) > D> P(Ay). 


k=1 
On the other hand, 


P(A) = P*(A) < )P*(Ax) =D P(AD)- 


k=1 k=1 


The theorem is proved. 


Theorem A1.2 The extension of the probability P from the algebra A to the o- 
algebra X is unique. 


Proof Assume that there exists another probability P; on 21, which coincides with 
P on A and is such that, for some A € 2, 


P(A) # P(A). 


Suppose first that ¢ = P(A) — P(A) > 0. Consider a sequence {B,} € y(A) such 
that 


Y*P(B,) — B(A) < : 
n=1 
Then 
P(A) =P(A) +e> 5) P(By) + 8/2 


n=1 


624 1 Extension of a Probability Measure 
which contradicts the assumption that A C LJ, B,. Therefore 
P\(A)<P(A), AEA. 


Since P is p-continuous at the point 2, it follows that P; is also p-continuous at the 
point @, and hence at any “point” A € 2. Indeed, by virtue of (A1.1), 


|Pi1(A) — Pi (B)| < Pi(A@ B) < P(A@B) > 0 
if only p(A, B) = P(A @ B) = 0. Hence, for A € A, 


(A) Je P(B) he P)(B) = Pi (A) 
Bex Ber 


The theorem is proved. 


Let 2* = 0 (A) be the o-algebra generated by A. Since A C 2, we have 2* € 2, 
and the next statement follows in an obvious way from the above assertions. 


Corollary A1.1 The probability P can be uniquely extended from the algebra A to 
the o-algebra X* generated by A. 


Remark Al.1 The o-algebra 2 defined above as the closure of the algebra A with 
respect to the introduced distance p is in many cases wider than the o-algebra 2(* = 
o (A) generated by A. This fact is closely related to the concept of the completion of 
a measure. To explain the concept, we assume from the very beginning that A = ¥ 
is a o-algebra. Then the measure P can be constructed in a rather simple way. To 
do this we extend the measure P from (§2, §) to a o-algebra which is wider than ¥ 
and is constructed as follows. We will say that a subset N of {2 belongs to the class 
N if there exists an A = A(N) € § such that N C A and P(A) = 0. It is not hard to 
see that the class of all sets of the form BU N, where B € § and N €N, also forms 
a o-algebra. Denote it by ¥jy. Putting P(B U N) := P(B) we obtain an extension of 
P to (2, Fx). Such a measure is said to be complete, and the above operation itself 
is called the completion of the measure P. 


Now we can say that the measure P constructed in Theorem A1.1 is complete, 
and the o-algebra 2l coincides with ¥v. 

If, for example, S2 = [0, 1] and A is the algebra generated by the intervals, then 
2* = o(A) will, as we already know, be the Borel o-algebra, and 2 will be the 
Lebesgue extension of 21* consisting of all “Lebesgue measurable” sets. 


Appendix 2 
Kolmogorov’s Theorem on Consistent 
Distributions 


In this appendix we will prove the Kolmogorov theorem asserting that consistent 
distributions define a unique probability measure such that the consistent distribu- 
tions are its projections. We used this theorem in Sect. 5.5 and in some other places, 
where distributions on infinite-dimensional spaces were considered. 

Let T be an index set and, for each t € T, R;, be the real line (—oo, co). Let 
N €T bea finite subset of T. Then the product space 


][®=R” 


teT 


is a Euclidean space of dimension equal to the number n of elements in NV, spanned 
on n axes of the space 


R? =[[R. 


teT 


Assume that, for any finite subset NV C T, a probability measure Py is given on 
(RY, 8%), where 8% is the o-algebra of Borel subsets of R . Thereby a family of 
measures is given on R’. The family is said to be consistent if, for any L C N and 
any Borel set B from RZ, 


P,(B) =Py(B x RN“). 


The measure P7, is said to be the projection of Py onto R“. A set from R? that 
can be represented in the form B x RI-N , where Be 38% and N is a finite set, is 
called a cylinder set in R’ . The set B is said to be the base of the cylinder. 

Denote by 87 the o-algebra of sets from R? generated by all cylinder sets. 


Theorem A2.1 (Kolmogorov) [fa consistent family of probability measures is given 
on R’, then there exists a unique probability measure P on (R™ , 8") such that, for 
any N, the measure Py coincides with the projection of P onto RN. 


A.A. Borovkov, Probability Theory, Universitext, 625 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


626 2 Kolmogorov’s Theorem on Consistent Distributions 


Proof The cylinder subsets of R’ form an algebra. We show that, for B ¢ 8, the 
relations 


P(B x R’-%) =Py(B) (A2.1) 


define a measure on this algebra. First of all, by consistency of the measures Py, 
this definition of probability on cylinder sets is consistent (we mean the cases when 
B= B, x R\~* for B; € 8*; then the left-hand side of (A2.1) will also be equal 
to P(B, x R’-£)). Further, the thus defined probability is additive. Indeed, let By x 
R?—! and Bo x R?—2 be two disjoint cylinder sets. Then, putting N = Nj U No, 
we will have 


P((Bi x BTN!) U (Bs x R7-™?)) 
=P({(B x RY") U(Bp x RY-"2)} x RP") 


=Py({(B1 x RYN") U (Be x RY™)}) 


— Py (Bi x RY-™1) + Py (Bz x RY 2) 
— P(B; x R71) + P(Bo x er"), 


To verify that P is countably additive, we make use of the equivalence of prop- 
erties P3 and P3’ (see Chap. 2). By this equivalence, it suffices to show that if B,, 
n=1,2,..., is a decreasing sequence of cylinder sets and, for some ¢ > 0, we 
have P(B) > e¢,n=1,2,..., then B= (hai By, is not empty. Since the B, are 
enclosed in all the preceding sets, in the representation B, = By x R?— one has 
Nn C Nazi and By41O R C B,. Without loss of generality, we will assume that 


the number of elements in the set NV, = {t1,..., t,} 1s equal to n, and denote by x; 
(with various superscripts) the coordinates in the space R,,. 
Thus, let 


P(B,) = Py, (Bn) = € > 0. 


We prove that the intersection 


is non-empty. For any Borel set B, C R”, there exists a compactum K,, such that 


E 
Kn CBn, Py, (Bn — Kn) < anti 


Setting Ky := Kn x R?-Nn, we obtain 


E 
P(B,, _ Kn) S Py, (Br _ Kn) < Qntl* 


2 Kolmogorov’s Theorem on Consistent Distributions 627 


Introduce the sets D,, := Veet K,. It is easy to see that D,, C By, are also cylinders. 
Because 


n n 
Bu —( ) Kec ( \(Be — Kx), 
k=1 k=1 
we have 


P(Bn — Dn) < (Aen - x0] <) P(®.-K) < =: 


k=1 k=1 
é E 
P(D,) = P(Bn) — 5 2 5" 


It follows that D, is a decreasing sequence of non-empty cylinder sets. Denote by 
e— (xt, Baa ...,X/') an arbitrary point of the base 


n 
Dn = () Ky x RN» NM 
k=1 


of the cylinder D,,. The point specifies a cylinder subset X of R’. Since the sets D, 


decrease, we have Gr: i, ..., x74) € K, for any r > 0. By compactness of 


Ky, we can choose a subsequence 7x such that Ae — x; as k > oo. From this 
subsequence, one can choose a subsequence n2,; such that 5 Ne — x, and so on. 
Now consider the diagonal sequence of the points (or, more precisely, cylinder 


sets) Xk = (xj, x5", ..., xnkt). It is clear that 


Xk > X = (x1, X2,...) 
(component-wise) as k — ov, and that 
(Gaia 2 ae) = (x1, eae xy) € Kn 
for any m. This means that, for the set X corresponding to the point X, one has 


X= {y(t) ER? : y(t) = x1, yo) = x2, ...} C Km C Bm 


for any m, and therefore 


[o@) 
Ce: () Bri. 
m=1 


Thus B is non-empty, and the countable additivity of P on the algebra of cylinder 
sets is proved. Hence P is a measure, and it remains to make use of the theorem 
on the extension of a measure from an algebra to the o-algebra generated by that 
algebra. 

The theorem is proved. 


Appendix 3 
Elements of Measure Theory and Integration 


In this appendix, the properties of integrals with respect to a measure are presented 
in more detail than in Chaps. 4 and 6. We also prove the basic theorems on decom- 
position of measure and on convergence of sequences of measures. 


3.1 Measure Spaces 


Let (2, §) be a measurable space. We will say that a measure space (2,8, f@) is 
given if mw is a nonnegative countably additive set function on %, i.e. a function 
having the following properties: 


(1) KU; Aj)= Dar #(A;) for any countable collection of disjoint sets A; ¢ ¥ 
(a -additivity); 

(2) (A) = 0 for any A € §; 

(3) “(@) = 0, where @ is the empty set. 


The value yx(A) is called the measure of the set A. We will only consider finite 
and o -finite measures. In the former case one assumes that ($2) < oo. In the latter 
case there exists a partition of (2 into countably many sets A ; such that #(A;) <0. 

A probability space is an example of a space with a finite (unit) measure. The 
space (IR, 8, w), where R is the real line, S is the o-algebra of Borel sets, and mp is 
the Lebesgue measure, is an example of a space with a o-finite measure. 

We can also consider such set functions (A) that satisfy conditions (1) and (3) 
only, but are not necessarily nonnegative. Such functions are called signed measures. 
Any finite signed measure (i.e., such that sup, @(A) < oo and inf, “(A) > —oo) 
can be represented as a difference of two nonnegative measures (the Hahn decompo- 
sition theorem, see Sect. 3.5 of the present appendix). We will need signed measures 
in Sect. 3.5 only. Everywhere else, unless otherwise specified, by measures we will 
understand set functions possessing properties (1)—(3). 

In the same manner as when establishing the simplest properties of probability, 
one easily establishes the following properties of measures: 


A.A. Borovkov, Probability Theory, Universitext, 629 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


630 3 Elements of Measure Theory and Integration 


(1) w(A)<K(B)if ACB, 

(2) w(U; Aj) <j w(A,) for any Ay, 

(3) if Ay C An+1 and U, A, = A then “(A,) > “(A), or, which is the same, 
(3’) if An D An4+1, ie Ayn = A, and s(A1) < oo then w(Ay) > W(A). 


Consider further measurable functions on (2, §), i.e., functions €(w) having the 
property {w: €(w) € B} € § for any Borel subset B of the real line. 

The notions of convergence in measure and convergence almost everywhere are 
introduced similarly to the case of probability measure. 

We will say that a sequence of measurable functions &, converges to & almost 
everywhere (a.e.): &y = € asn —> 00 if &,(@) — &(@) for all w except from a set 
of measure 0. 

We will say that the &, converge to € in measure: &, zat é if, for any e > 0, as 
n— Oo, 


1({lén — £1 > e}) > 0. 


Now we turn to the construction of integrals and the study of their properties. 
First we consider finite measures assuming them without loss of generality to be 
probability measures. In that case we will write P(A) instead of 4(A). We will turn 
to integrals with respect to arbitrary measures in Sect. 3.4. 


3.2 The Integral with Respect to a Probability Measure 


3.2.1 The Integrals of a Simple Function 


A measurable function &(@) is said to be simple if its range is finite. The indicator 
of a set F € § is the simple function 


1, ifweF, 


I = 
PON itd F. 


Clearly, any simple function §(@) can be written in the form 


Eo) =) lp, (), 


k=1 


where x,,k =1,2,...,, are values assumed by &, and Fy = {w: &(w) = xx}. The 
sets Fy € § are disjoint, and )p_, Fx = &. The integral of the simple function &(@) 
with respect to a measure P is defined as the quantity 


[ear | soyaPo) = YP) = EE. 
k=1 


3.2 The Integral with Respect to a Probability Measure 631 


The integral of the simple function &(@) over a set A € § is defined as 


[gar= [ eeontaoraPe. 


That these definitions are consistent (the partitions into sets F; may be different) 
can be verified in an obvious way. 


3.2.2 The Integrals of an Arbitrary Function 


Lemma A3.2.1 Let E(w) > 0. There exists a sequence &n(w) of simple functions 
such that €y,(@) t €(@) asn > o forallwe 2. 


Proof Partition the segment [0, 1] into n2” equal intervals. Let 
xo=0, x1 =2°°,) ..., Xnnr=Nn, 
denote the partition points, so that xj4, — xj =27”. Put 
a {eo : x; < E(w) < xij}, i=1,2,...,n2"—1; 


n2"-1 


Fo = {0 < €(w) < x1} U {&@) =n}, En(@) = ~ Xilp,(@) < §(). 


i=0 
The function &, (@) is clearly simple, &,(@) < &n41(@) < &(@) for all w, and has the 
property that ifn > €(@) at a point w € 2 then 


1 
OS5(@)— So) So, 


The lemma is proved. 


Lemma A3.2.2 Let &, + & > 0 and ny, + & > 0 be sequences of simple functions. 
Then 


lim / é,dP= lim i nn aP. 
no noo 


Proof We verify that, for any m, 


[snars lim [ mar, 
n—>oo 


The function &, is simple. Therefore it is bounded by some constant: & < Cm. 
Hence, for any integer n and ¢ > 0, 


Em — Mn S Cm Ye, >mq+e} + &- 


632 3 Elements of Measure Theory and Integration 


This implies that 
Eg < CmP{Em = Mn + €} +E +E. 
The probability on the right-hand side vanishes as n — oo: 


P{Em = Mn + €} S PLE =n +e} > O,~ 


because 7, converges almost surely (and hence in probability) to €. Therefore 
Bén < € + limps 00 En. Since ¢ is arbitrary, 


lim Eé, < lim Enp. 
noo n—->oo 


Swapping {&,} and {n,}, we obtain the converse inequality. 
The lemma is proved. 


The assertions of Lemmas A3.2.1 and A3.2.2 make the following definitions 
consistent. 

The integral of a nonnegative measurable function &(@) (with respect to measure 
P) is the quantity 


[ear- tim, f & dP, (A3.2.1) 


where &, is a sequence of simple functions such that &, t+ & asn > oo. 

The integral { € dP will also be denoted by Eg. We will say that the integral 
J & dP exists and & is integrable if EE < 00. 

The integral of an arbitrary function (assuming values of both signs) €(w) (with 
respect to measure P) is the quantity 


Eé = Eé*+ —E&~,  &* := max(0, +6), 


which is defined when at least one of the values E&é~ is finite. Otherwise Eé is 
undefined. The integral Eé exists if and only if E|é| < 00 exists (for || =€* +€7). 
If Eé exists then 


BG: A)i= f ¢aP= Bel, 
A 
exists for any A € § as well. 


Lemma A3.2.3 If Eé exists and B, € § is a sequence of sets such that P(B,) > 0 
as n — oo, then 


E(é; B,) > 0. 
Proof For any sequence |é,,| + |&| of simple functions and A, := {|&| < m} one has 


Eg) > lim Elg|I4,, = lim E|§m|la,, = Elé|, 
m—> CO m—>C 


3.2 The Integral with Respect to a Probability Measure 633 
since |&|I4,, ¢ |&|. This implies that 
E\é|= lim Elé|I4,, = lim E((é|; |€| <), 

m—> oo m—>oo 

and hence, for any ¢ > 0, there exists an m(e) such that 
E|é| — E(lé|; |&| <m) <e 
for m > m(e€). Consequently, for such m, one has 
E(\€|; Bn) =E(lé|; {I€] <m} Bn) + E(l&|; {1E] > m} Bn) <mP(Bn) + ¢, 


and hence 


lim sup E(\é|; Bn) <. 


no 


The lemma is proved. 


Note that Lemma 6.1.2 somewhat extends Lemma A3.2.3. 


Corollary A3.2.1 If Eé is well-defined (the values +00 not being excluded) and 
By € & is a sequence of sets such that P(B,) > 1 as n — ov, then 


E(é; Bn) > Eé. 
Proof \f E& exists then the required assertion follows from Lemma A3.2.3. 


Now let EE = oo. Then E&~ < 00 and E&* = 00, where &* = max(0, +8). It 
follows that E(é~; B,) — E&~ as n > oo. We show that 


E(&+; Bn) > oo. (A3.2.2) 


Let Ay := {€ € [24—!, 25}, k =1,2,...; pe := P(Ax). We can assume with- 
out loss of generality that all p,; > 0 (if this is not the case we can consider a 
subsequence k; such that all Pk; > 0). Since EE+ < 14 yi gk Pk, we have 
si 2k pg = 00. Fora given N > 1, choose n large enough such that P(B, Ax) > 
Dk /2 for all k < N. Then 


N 


E(E+; By) = > 2** py, 


k=1 


where the right-hand side can be made arbitrarily large by an appropriate choice 
of N. This proves (A3.2.2). Since § = &+ — &—, the required convergence is proved. 
The case E§ = —oo can be dealt with in the same way. The corollary is proved. 


634 3 Elements of Measure Theory and Integration 


3.2.3 Properties of Integrals 


Il. If sets Aj € § are disjoint and U; A; = then 


few-r] é dP. (A3.2.3) 
oe 


Proof It suffices to prove this relation for €(@) > 0. For simple functions equal- 
ity (A3.2.3) is obvious, because 


J 


[ears Dare=m=> x4P(E = xg; Aj). 
k k 
In the general case, using definition (A3.2.1) one gets 
[ear- lim [sar= lim | é, dP 
noo noo 7 Aj 
= lim / é, dP = ay EdP. (A3.2.4) 
“n> Ja. — JA. 
J 7 J $ 
Swapping summation and passage to the limit is justified here, for by Lemma A3.2.3 
[o,@) [o,@) [o,@) 
> , gn ame U ai) <tr U 4) >0 
jan YA j=N j=N 


as N — oo uniformly inn. 


[etmar= fears f nar. 


Proof For simple functions this property is obvious. Hence, for § > 0 and n > 0, 
this property follows from the additivity of the limit. 
In the general case we have (€* and n~ are defined here as before) 


[etmars [er +ntyar— [e+ yar 


= [stap—f aps f rtae— fr ap= fears f nar, 


13. If c is an arbitrary constant, then 


[ear=c fear. 


12. 


3.3. Further Properties of Integrals 635 


14. If & <n, then fEdP < f naP. 


The proof of properties I3 and I4 is obvious. Since 


[eapaee, 


we can write down properties I1-I4 in terms of expectations as follows: 


Hl. E§ = )) E(é; Aj) if Aj are disjoint and \); Aj = &. 
12. EG +) = Eg + En. 

13. Eak =aEé. 

14. EE <En, ifé <n. 


Note also the following properties of integrals which easily follow from I1-I4. 


I5. |E&| < Elé|. 
16. If cy <€& < co, thenc, < KE <cp. 
17. If § > 0 and Eé = 0, then P(é = 0) = 1. 


This property follows from the Chebyshev inequality: P(E > ¢) < E&/e = 0 for 
any e > 0. 


18. If P(é =n) = 1 and EE exists then EE = En. 
Indeed, 


En = lim E(n; |n| <n) = lim E(€; || <n) =Eé. 
n—-oo noo 


3.3 Further Properties of Integrals 


3.3.1 Convergence Theorems 


A number of convergence theorems were proved in Sect. 6.1. One of them was the 
dominated convergence theorem (Corollary 6.1.3): 

Tf &, =a € asn — ooand |é,| <n, En < 00, then the expectation Eé exists and 
Eé, > Eé. 

Now we will present some further useful assertions concerning convergence of 
integrals. 


Theorem A3.3.1 (Monotone convergence) If 0 < &, t &, then E& = limy-. 0 Eéy. 


Proof In addition to Corollary 6.1.3, here we only need to prove that E&, — oo 
if EE = oo. Put €% := min(é,, N) and €% := min(é, N). Then clearly &% t &% as 
n —> oo, and Fé" + Fé". Therefore the value Fé" < Eé, can be made arbitrarily 
large by choosing appropriate n and N. The theorem is proved. 


636 3 Elements of Measure Theory and Integration 


These theorems can be generalised in the following way. To make the extension 
of the convergence theorems to the case of integrals with respect to signed measures 
in Sect. 3.4 more convenient, we will now write E& in the form of the integral 


féaP. 


Theorem A3.3.2 (Fatou-Lebesgue) Let n and ¢ be integrable. If —&, <n then 


timsup [ &, ap | timsups, dP. (A3.3.1) 
n—->oo n—->oo 
If &n = & then 
timint [ En ap> | timinté, dP. (A3.3.2) 
n—->0oOo noo 


If En + & and Ey > 6, or &) —S & and t <&_ <n, then 
sim, f & ap= | eae. (A3.3.3) 


Proof We prove for instance (A3.3.2). Assume without loss of generality that ¢ = 0. 
In this case, as n + oo, 


E> nn := inf & tliminf&, Mn = O, 
k>n k->0o 
and by the monotone convergence theorem 
timint &,dP> lim / Nn dP = i lim inf &, dP. 
noo noo noo 


Applying (A3.3.2) to the sequence 7 — &, we obtain (A3.3.1); (A3.3.3) follows from 
the previous theorems. The theorem is proved. 


3.3.2 Connection to Integration with Respect to a Measure on the 
Real Line 


Let g(x) be a Borel function given on the real line R (if 8 is the o-algebra of Borel 
sets on the line and B € 8, then {x : g(x) € B} € B). If — is a random variable 
then n := g(&(@)) will clearly also be a random variable. As we saw in Sect. 3.2, 
a random variable € induces the probability space (IR, 8, F:) with measure Fz on 
the line such that F;(B) = P(é ¢ B). Therefore one can speak about integrals with 
respect to that measure. 


Theorem A3.3.3 If 1 = g(E(@)) and En exists, then 


En= | ndP= | g(x Fe(ds) 
2 R 


(on the right-hand side we used a somewhat different notation for [ g dF). 


3.3. Further Properties of Integrals 637 


Proof Let first g(x) = Ig(x) be the indicator of a set B € 8. Then n = g(E(@)) = 
Ijeepy(@) and En = P¢é € B). Therefore 


[ scor: (dx) = i; Ig (x)F¢ (dx) = Fz (B) = P(E € B) =En. 


Using the properties of the integral it is easy to establish that the assertion of the 
theorem holds for simple functions g. Passing to the limit extends that assertion to 
bounded functions. Now let g > 0. If the function g(&)Ig(€) = n(@)lec By (@) is 
bounded, then 


/ 8 (x) Fs (dx) = E(n; & € B). 
B 
Therefore 
/ g dF; =E(n; n <n). 
{g<n} 
Passing to the limit as n > 00 we get the assertion of the theorem. Considering the 


case when g takes values of both signs does not create any difficulties. The theorem 
is proved. 


Introducing the notation 
Fg(x) =P <x), 


we can also consider, along with the integral just discussed, 


: 8(x) F< (dx), (A3.3.4) 
R 


the Riemann-Stieltjes integral 


[ecoar (x), (A3.3.5) 


the definition of which was given in Sect. 3.6. It was also shown there that, for con- 
tinuous functions g(x), these integrals coincide. Moreover, we discussed in Sect. 3.6 
some other conditions for these integrals to coincide. 

Also recall that if 


ras fi fe (t)dt 


and the functions g(x) and fz(x) are Riemann integrable, then integrals (A3.3.4) 
and (A3.3.5) coincide with the Riemann integral 


[ srooae. 


638 3 Elements of Measure Theory and Integration 


3.3.3. Product Measures and Iterated Integrals 


Consider a two-dimensional random variable ¢ = (€,7) given on (2, %,P). The 
random variables and 7 induce a sample probability space (R*, 87, Fz ,) with the 
measure F¢ ,, given on elements of the o-algebra 38° of Borel sets on the plane (the 
o-algebra generated by rectangles) and such that 


Fz (A x B) =P(E € A, € B). 


Here A x B is the set of points (x, y) for which x € A and ye B. If g(x, y) isa 
Borel function ({(x, y) : g(x, y) € B} € 8? for each B € 8), then it easily follows 
from the above that 


Ee(é,) = i g(x, y)Fe,,(dx dy), (A3.3.6) 


since both integrals are equal to i xFo (dx) for 0 = g(&,n). 
Now let € and 7 be independent random variables, i.e. 


PE EA, ne B)=PE € A)P(NE€ B) 
for any A, Be B. 


Theorem A3.3.4 (Fubini’s theorem on iterated integrals) If g(x, y) > 0 is a Borel 
function and & and n are independent, then 


Eg(&,n) = E[Eg(x, )|x=s]- 


For arbitrary Borel functions g(x, y) the above equality holds if Eg(&, 1) exists. 


This very assertion we stated in Chap. 3 in the form 


i g(x, y)E¢,n(dx dy) = if i g(x, yFy(dy) Fe (dx). (A3.3.7) 
We will need the following. 
Lemma A3.3.1 1. The section 
By = {y: (x,y)E€ B} 


of any set B € 8? is measurable: B, € B. 

2. The section g,(y) = g(x, y) of any Borel function g (%8?-measurable) is a 
Borel function. 

3. The integral 


[se y)F, (dy) (A3.3.8) 


of a Borel function g is a Borel function of x. 


3.3. Further Properties of Integrals 639 


Proof 1. Let K, be the class of all sets from 87 of which all x-sections are measur- 
able. It is evident that 1; contains all rectangles B = By) x Bz), where By) € B 
and By) € %. Moreover, K; is a o-algebra. Indeed, consider for example the set 
B=, B” where B“ ©. The operation |) on the sets B leads to the same 
operation on their sections, so that By = LJ k B® € 8%. For the other operations 
(A and taking complements) the situation is similar. Thus, K, is a o-algebra con- 


taining all rectangles. This means that 8? C Kj. 
2. For B € 8, one has 


g, (B) = {y: ex(y) € B} ={y: a(x, y) € B} 
={y:@,y)eg'(B)} =[g |B], €B. 


3. Integral (A3.3.8) is, as a function of x, the result of passing to the limit in 
a sequence of measurable functions, and hence is measurable itself. The lemma is 
proved. 


Proof of Theorem A3.3.4 First we prove (A3.3.7) in the case where g(x, y) = 
Ig(x, y), so that the theorem turns into the formula for consecutive computation 
of the measure of the set B € B?: 


P((E, ne B) = fol. ye B)F: (dx) = [ mwor: (dx). (A3.3.9) 
We introduce the set function 
Q(B) = | F,(B Rea. 


Clearly, Q(B) > 0 and Q(@) = 0. Further, if B =|, B™ and B™ are disjoint, 
then By =U; B® and Bw are also disjoint, and 


Q(B) = / F,(U BR (dx) = dX / F,(B)F¢ (dx) = Las"). 


k 


This means that Q(B) is a measure. 
The measure Q(B) coincides with Fz ,(B) = P((€,) € B) on rectangles B = 
Bay xX By). Indeed, for rectangles, 


B= Ba) forx € Bay, 
~~ 1a for x ¢ Bay, 


and 
P((E,n) € B) = Fe(Bay) F (Bo) 


=f F,(Ba)Fe(dx) = [ F,(B)Fe(dx) = QB). 
dq) 


640 3 Elements of Measure Theory and Integration 


This means that the measures Q and F;, coincide on the algebra generated by 
rectangles. By the measure extension theorem we obtain that Q =F: ,. 

We have proved (A3.3.9). This implies that Fubini’s theorem holds for simple 
functions gy = yi cj14,, because 


N 


Egn(E.9) =) cjEla, . 0) 


j=l 


N 
- Ye; | Bla, (eM Re(dx) = f Bay (x, Re(dx)(A3.3.10) 


j=l 


Now if g > 0 is an arbitrary Borel function then there exists a sequence of simple 
functions gy + g and, as in (A3.2.1), it remains to pass to the limit: 


Eg.n) = lim Eg.) 
= dim, [ Beye mex) = f Bey (GW Feds 


For an arbitrary function g one has to use the representation g = g* — g-, gt > 0, 
g_ > 0. The theorem is proved. 


Remark A3.1 We see from the proof of the theorem that the random variables € and 
n do not need to be scalar. The assertion remains true in a more general form (see 
property 5A in Sect. 4.8) and, in particular, for vector-valued & and 7. 


3.4 The Integral with Respect to an Arbitrary Measure 


If uw is a finite measure on (S2, 5), (2) < oo, then the definition of the integral 
ni & dg with respect to the measure yz does not differ from that of the integral with 
respect to a probability measure (one could just put As adh = M2) f 46 @P, where 
P(B) = w(B)/(S2) is a probability distribution). If w is o-finite and w(2) = oo, 
then the situation is somewhat more complicated, although it can still be reduced to 
the already used constructions. First we will make several preliminary remarks. 
Let (92, §, P) be a probability space and f = f(w) > 0 an ae. finite nonnegative 
measurable function (i.e., a random variable). Consider the set function 


BA) =] faP. (A3.4.1) 
A 


If f is integrable (w({2) < co) then w(A) is a finite o-additive measure (see prop- 
erty I1) satisfying conditions (1)—(3) of Sect. 3.1 of the present appendix. In other 


3.4 The Integral with Respect to an Arbitrary Measure 641 


words, m is a finite measure on (2, §). Butif f is not integrable, then mw is a o-finite 
measure, which immediately follows from the representation 


mar= df dP 


{k— ieee 


(the integrals in the sum that are equal to af AS l-1<f <k) @P are clearly finite mea- 
sures). 

Thus, the integral of the form (A3.4.1) is a measure for any distribution P and 
function f > 0. It turns out that the following assertion, converse in a certain sense 
to the above, also holds. 


Lemma A3.4.1 For any measure wt on (82, %), there exists a distribution P on that 
space and a measurable function f > 0 such that representation (A3.4.1) holds. 


Thus, any measure can be represented as an integral with respect to a probability 
measure (i.e., in the form E( f; A) for the respective function f and distribution P). 


Proof Let « be ao-finite measure on (92, §), and sets B; € §, j = 1,2,..., possess 
the properties U1 Bj = 92, BB; = @ fori ¥ j, and w(B;) < oo. Put 


[o.) 
M(ABx) 
P(A) := —_—_——., (A3.4.2) 
» 2 w( Bx) 
Obviously, P(§2) = 1 and P is a measure. Further, if A C B, then 
w(A) = 2" w(By)P(A). 


This means that we should put f(@) := ok /(B;) for w € By. Then the set function 


4(A) =) par= | fl,dP 
A 2 


will coincide with w(A): 


AA) =D) 2*w(By)P(ABg) 
k=1 


[ee 


=D tue 1 a Ora = MAB = WA) 
k=1 


The lemma is proved. 


Besides the required assertion, we also obtain that in representation (A3.4.1) the 
range of values of the function f can be assumed without loss of generality to be 
countable. 


642 3 Elements of Measure Theory and Integration 


The function f for which equality (A3.4.1) holds is called the density of the 
measure ft with respect to P (or Radon—Nikodym derivative of the measure ju with 
respect to P) and is denoted by dy /dP. It is evident that alteration of the function 
f =d/dP on a set of zero P-measure leaves the equality (A3.4.1) unchanged. 

Now let yt and P be two given arbitrary measures. The question of under what 
conditions these two measures mz and P could be related by (A3.4.1) and whether 
the function f is determined uniquely thereby (up to values on a set of zero P- 
measure) is rather important for probability theory. (We stress that, in the preceding 
considerations, the measure P was constructed in a special way from the measure 
fl, or vice versa.) Answers to these questions are given by the Radon—Nikodym 
theorem to be discussed in the next section. 

Now, using the simple assertion of Lemma A3.4.1 we have just proved, we will 
give the definition of the integral with respect to an arbitrary measure jt. 

Let mw be an arbitrary o-finite measure on (92, 5) and & > 0 a §-measurable 
function. 

The integral f 4,6 4pm over a set A € § of the function € > 0 with respect to the 


measure yp is the integral 
dp 
d= —— |dP A3.4.3 
i 5 dp I (« *) ( ) 


with respect to any distribution P satisfying equality (A3.4.1) (for example, with 
respect to measure (A3.4.2)). 

This definition is consistent because it does not depend on the choice of P. In- 
deed, for simple functions € (&(@) = x, for w € Fx), 


dp dp 
Edp= uf — Ip, dP= uf —dP= xp P@(ABg). 
Jsde- Do f gpl Df, gh 


If now & > 0 is an arbitrary function, then by the monotone convergence theorem 
the integral [ 4 § GH is equal to 


d 
lim [eofar- lim [ eau, 
noo J 4 dP noo J 4 


where €) 4 € is a sequence of simple functions which converge monotonically to 
€ (see Lemma A3.2.1). In both cases, the result does not depend on the choice of P. 


The integral 
/ Edu 
A 


of an arbitrary measurable function & is defined by 


[edu=f eran - f ean, 


3.5 The Lebesgue Decomposition Theorem 643 


when both expressions on the right-hand side are finite. (In that case one says 
that the integral Ii Edy exists.) Here, as before, £* = max(0,&) > 0 and €~ = 
max(0, —&) > 0, so that € =&* —E7. 

Thus we see that the above definition of the integral with respect to an arbitrary 
measure is essentially equivalent to the construction used in Sect. 3.2 of the present 
appendix. However, the definition in the form (A3.4.3) saves us from the necessity 
of repeating what we have already done (and now in a more complex setting) and 
enables one to transfer all the properties of the integrals { & dP to the general case. 
We will list the basic properties preserving the existing numeration. 


Hl. fEdp= Dida; Edu if Aj are disjoint and U); Aj = 2. 


12. [E+ndu=fedut fndp. 

13. fa&dw=a fédum. 

14. Edu < fndpité <n. 

15. |f dul <fléldn. 

16. If cy < €(@) < cz for we A, then ciM(A) < fy, Edm <cop(A). 
17. If& >Oand f&dp =O, then wg > 0)=0. 

18. If w(é An) =0, then fEdp= fondu. 


It is clear that all the convergence theorems remain valid as well. 


Theorem A3.4.1 (The dominated convergence theorem) Let |&,| <1 and [ ndp 


exist. If Ey or &, > € a.e.asn— © then 


[oan fed. 


Theorem A3.4.2 (The monotone convergence theorem) /f 0 < &, + € as n > 00 


then 
[ean [saw 


Theorem A3.4.3 (Fatou-Lebesgue) The statement and proof of this theorem is ob- 
tained from those of Theorem A3.3.2 by replacing P with qm. 


In conclusion we note that if 2 = R = (—ov, oo), ¥ = B is the o-algebra of 
Borel sets, wz is the Lebesgue measure, and the function g(x) is continuous, then 
the integral Sia. b] g(x) d(x) coincides with the Riemann integral 1 : g(x) dx. This 
follows from the preceding remarks in part 2 of Sect. 3.3 of this appendix. 


3.5 The Lebesgue Decomposition Theorem and the 
Radon-Nikodym Theorem 


We return to a question that has already been asked in the previous section. Un- 
der what conditions on measures fw and 2 given on (S2, 5) can the measure yt be 


644 3 Elements of Measure Theory and Integration 


represented as 


way= | far 
A 


We do not assume here that A is a probability measure. 


Definition A3.5.1 A measure y is said to be absolutely continuous with respect to 
a measure A (we write w ~< 2) if, for any A such that X(A) = 0, one has w(A) = 0. 


Definition A3.5.2 A set Ny is said to be a support! of measure m if 
(2 — Ny) =0. 


Definition A3.5.2 specifies a rather wide class of sets which can be called the 
support of the measure 2 when y is concentrated on a part of the space 2. If 2 =R 
is the real line (and in some other cases as well), one can use another definition 
which specifies a unique set for each measure. Consider the collection of all intervals 
(a,b) C R with rational endpoints a and b. This collection is countable. Remove 
from §2 = R all such intervals for which w((a, b)) = 0. The remaining set (which is 
measurable) is called the support of the measure w. 


Definition A3.5.3 One says that a measure m is singularwith respect to d if there 
exists a support N, of the measure 2 such that w(N)) = 0. Or, which is the same, if 
there exists a support NV, of the measure mw such that A(N,) = 0. 


The last definition, in contrast to Definition A3.5.1, is symmetric, so one can 
speak about mutually singular measures q& and 2X (this relation is often written as 
pwd). 


Theorem A3.5.1 (Radon—-Nikodym) <A necessary and sufficient condition for 
the absolute continuity f& < d is that there exists a function f unique up to d- 
equivalence (i.e., up to values on a set of zero 4-measure) such that? 


way= | far. 
A 


As we have already noted, the function f is called the Radon—Nikodym derivative 
dp./dx of the measure yz with respect to (or density of w with respect to 2). 

Since sufficiency in the assertion of the theorem is obvious, we will obtain the 
Radon—Nikodym theorem as a consequence of the following Lebesgue decomposi- 
tion theorem. 


'The conventional definition of support refers to the case when Q is a topological space. Then the 
support of yz is the smallest closed set such that its complement is of ¢-measure zero. 


This equality is sometimes adopted as a definition of absolute continuity. 


3.5 The Lebesgue Decomposition Theorem 645 


Theorem A3.5.2 (Lebesgue) Let su and id be two o -finite measures given on (82, §). 
There exists a unique decomposition of the measure qt into two components t, and 
fu, such that 


Ha xd, py Ld. 


Moreover, there exists a function f , unique up to \-equivalence, such that 
(A) = i; fdr. 
A 


It is obvious that if zw ~ 4 then yz, = 0, and the Lebesgue theorem then implies 
the Radon—Nikodym theorem. 


Proof Since and 2 are o-finite, there exist increasing sequences of sets 2/ 
and > such that 


w(2H) <0, A(@)<0, |Jat=2, |Jar=a 
n n 


Putting 2, := QP Nn i we obtain a sequence of sets increasing to $2 for which 
M(S2n) < 00, A(L2n) < 00. 


If we prove the decomposition theorem for restrictions of the measures fu and A 
to (Bn, &n), where By = 241 — 2, and §, is formed by sets B, A, A € §, we will 
thereby prove it for the whole £2. It will suffice to take w, and mw, to be the sums of 
the respective components for each of the restrictions. This remark means that we 
can consider the case of finite measures only. 

Thus let and A be finite measures. 

(a) Let F be the class of functions f > 0 such that 


/ fddx<p(A) forallAc ¥ (A3.5.1) 
A 


(the class F is non-empty, for the function f = 0 belongs to F). Set 


a= sup | f dd < (2) <0o 
feFJQ 


and choose a sequence /;, such that, as n > ov, 


[nara 


Put f t= max(fi,..., fr). Then clearly fn t f= sup f, and by the monotone 
convergence theorem 


[fa f Far. (A3.5.2) 
A A 


646 3 Elements of Measure Theory and Integration 


We now show that fe F, i.e., that (A3.5.1) holds for f. To this end, it suffices 
by virtue of (A3.5.2) to notice that f, € F. Let Ay, kK =1,...,n be disjoint sets on 
which fn = fx. Then p_, Ax = @ and 


[ia=y fedd <)> w(AAy) = WIA). 
A kai? AAk k=1 


Thus, for the “maximum” element f’ of F, (A3.5.1) also holds. 
(b) Putting 


[g(A) := | fad, bs =" Ma (A3.5.3) 


we prove that jf, is singular with respect to 4. We will need the following asser- 
tion about the decomposition of an arbitrary signed measure (for the definition, see 
Sect. 3.3.1 of this appendix). 


Theorem A3.5.3 (The Hahn theorem on decomposition of a measure) For any 
signed finite measure y, there exist disjoint sets Dt € § and D~ € § such that, for 
any AES, 


y(AD*)>0, = y(AD~) <0. 


Proof We first prove that there exists a set D € § on which y(A) attains its upper 
bound. 

Let B, € § be a sequence such that y(B,) > I = sup, y(A) as n > ow. 
Put B:=U ; Bx and consider, for a fixed n, the decomposition of T into 2” sets 
Bnjm,m=1,...,2", of the form ();_, By, where B, = By or B — By, k <n. For 
n<N, each B, » is a finite union of sets By yw, 1<M< 2". Denote by D,, the 
sum of all B, », for which y (By m) = 0. Then y(B,) < y(Dy). 

On the other hand, for VN > n, each By, y either belongs to D,, or is disjoint with 
it. Therefore 


¥(Dn) < ¥(Dn + Dati +++: + Dy). 


This implies that, for the set D = lim I ree Dx, one has y(B,) < y(D),  < 
y (D). Recalling the definition of I”, we obtain that y(D) =I’. 

Thus we have proved the existence of a set D on which y(D) attains its max- 
imum. We now show that, for any A € §, one has y(A D) > 0 and y(A D) <0, 
where D = 2 — D. Indeed, assuming, for instance, that y(AD) <0, we come to a 
contradiction, for in that case 


y(D— AD) =y(D)— y(AD) > y(D). 
Similarly, assuming that y(AD) > 0, we would get 


y(D + AD) =y(D)+ y(AD) > y(D). 


It remains to put D+ := D, D~ := D. The theorem is proved. 


3.5 The Lebesgue Decomposition Theorem 647 


Corollary A3.5.1 Any finite signed measure y can be representedasy =y* —y~, 
where y~ are finite nonnegative measures. 


To prove the corollary, it suffices to put 


y*(A) :=+y(An D4), 


where D~ are the sets from the Hahn decomposition theorem. 


We return to the proof of the fact that the measure mf, in equality (A3.5.3) is 
singular. Let D* be the set in the Hahn decomposition of the signed measure 


1 
Vy = My — Pe 


Put N =(),,D,.Then N=\),, D> and, for all n and A € §, 
1 
O<p,(AN) < —A(AN). 
n 
From here, letting n —> oo, we obtain w,(AN) = 0 and hence w,(A) = py, (AN). 


That is, the set N is a support of the measure j1,. 
Further, because 


tq (A) = W(A) — (AN) < m(A) — (ADF), 


we have 


#4 1 + + 
ft opt dh = (A) + 7 MADn ) < m(A) — vn (ADT) < MA). 
A 
2 A 
This means that f + —I,+ € F and hence 
n n 


Js J Dec 
a>] (| f+—Ip+)dA=a+—-a(DF). 
n fe n 


This implies AMDT ) =0 and A(N) = 0, so that ft, is singular with respect to i since 
N is a support of ty. 


Uniqueness of the decomposition yz = ft, + ft, can be established as follows. 
Assume that = «/, + w’, is another decomposition. Then y := w!, — fy = My — Ki. 
By singularity, there exist sets N and N’ such that w,(N) = 0, A(N) = 0, Ke, (N’) = 
0, and X(N’) = 0. Clearly, A(D) = 0, where D = N U N’. If we assumed that y = 
H, — ba = bs — bw, #0, then there would exist an A € § such that y(A) 4 0. 
Therefore, either y(AD) 4 0 or y(AD) 4 0. However, the former is impossible, 
for A(D) = 0 implies w/,(D) = f(D) = 0. The latter is also impossible, since D= 
NWN and hence bh (D) = wi (D) = 0. 


648 3 Elements of Measure Theory and Integration 


Uniqueness of the function f (up to -equivalence) follows from the observation 
that the equalities 


wot)=f para | rar, [(e-sya=o 


for all A imply the equality f — f’ =0 a.e. Assuming, say, that 4(A) > 0 for 
A={w: f — f’ >} would yield for such A the relation [,(f — f’)dd > 0. The 
theorem is proved. 


One of the most important applications of the Radon—Nikodym theorem is the 
proof of existence and uniqueness of conditional expectations. 


Proof Let §o be a o-subalgebra of § and € a random variable on (§2, §, P) such 
that Eé exists. In Sect. 4.8 we defined the conditional expectation E(é | %o) of the 
variable € given §o as an §o-measurable random variable n for which 


E(n; B) = E(é; B) (A3.5.4) 


for any B € §. We can assume without loss of generality that € > 0 (an arbitrary 
function € can be represented as a difference of two positive functions). Then 
the right-hand side of (A3.5.4) will be a measure on (2, ¥o). Since E(é; B) = 0 
if P(B) = 0, this measure will be absolutely continuous with respect to P. This 
implies, by the Radon—Nikodym theorem, the existence of a unique (up to P- 
equivalence) measurable function 7 on (2, ¥o) such that, for any B € §o, 


Be: B)= | ndP. 
B 


This relation is clearly equivalent to (A3.5.4). It establishes the required existence 
and uniqueness of the conditional expectation. 


Another consequence of the assertions proved in the present section was men- 
tioned in Sect. 3.6 and is related to the Lebesgue theorem stating that any distribu- 
tion P on the real line R = (—00, 00) (or the respective distribution function) has a 
unique representation as a sum of the three components P = P, + P, + Py, where 
the component P, is absolutely continuous with respect to Lebesgue measure: 


P,(A) = i; Fx) dx; 


P, is the discrete component concentrated on an at most countable set of points 
X1,X2,... such that P({x;}) > 0, and the component P; has a support of Lebesgue 
measure zero and a continuous distribution function. This is an immediate conse- 
quence of the Lebesgue decomposition theorem. One just has to extract the dis- 
crete part from the singular (with respect to Lebesgue measure 1.) component of P, 
first removing all the points x for which P({x}) > 1/2, then all points x for which 


3.6 Convergence in Arbitrary Spaces 649 


P({x}) = 1/3, and so on. It is clear that in this way we will get at most a countable 
set of xs, and that this process determines uniquely the discrete component P93. 

All the aforesaid clearly also applies to distributions in n-dimensional Euclidean 
spaces R”. 


3.6 Weak Convergence and Convergence in Total Variation of 
Distributions in Arbitrary Spaces 


3.6.1 Weak Convergence 


In Sects. 6.2 and 7.6 we studied weak convergence of distributions of random vari- 
ables and vectors, i.e., weak convergence of distributions in R*, k > 1. Now we 
want to introduce the notion of weak convergence in more general spaces X. As the 
definitions given in Sect. 6.2 show, we will need continuous functions f(x) on X. 
This is possible only if the space X is endowed with a topology. For simplicity’s 
sake, we restrict ourselves to the case where the space X is endowed with a met- 
ric o(x, y). Thus, assume we are given a measurable space (X, 8) with a metric p 
which is “consistent” with the o-algebra B, i.e., all open (with respect to the met- 
ric ¢) sets from X belong to B (cf. Sect. 16.1), so that any continuous (with respect 
to e) functional will be 8-measurable. This means that if a distribution Q is given 
on (X, 8) (1.e., a probability space (, B, Q) is given), then {x : f(x) < t} € B for 
any t, and the probabilities of these sets are defined. 

Now let (§2,%,P) be the basic probability space. A measurable mapping 
— = &(w) of the space (2, %) to (X, B) is called an X-valued random element. 
If (92, §) = (X, B), the mapping € may be the identity mapping. The space (X, 8) 
is said to be the sample or state space of the random element €. When a functional 
f is continuous, f(&) is a random variable in (R, 8). 


Definition A3.6.1 Let a distribution P and a sequence of distributions P,, be given 


on the space (X, 8). The sequence P,, is said to converge weakly to P: P, > P as 
n — oo if, for any bounded continuous functional f (f € C,(%)), 


[ ferarncos [ rorarco. (A3.6.1) 


If €, and € are random elements having the distributions P,, and P, respectively, 
then (A3.6.1) is equivalent to 


Ef én) > Ef(). (A3.6.2) 


This, in turn, for any continuous functional f (f € C(%X)), is equivalent to 


f (En) > Ff). (A3.6.3) 


650 3 Elements of Measure Theory and Integration 
Indeed, (A3.6.3) means that, for any bounded continuous function g (g € C,(R)), 


Eg(f(&:)) > Eg(f()), (A3.6.4) 


which is equivalent to (A3.6.2). 

If X = X(T) is the space of real-valued functions x(t), t € T, given on a paramet- 
ric set T, and a measurable mapping &(w) of the basic probability space (2, ¥, P) 
into (X, 3B) is given, then the random element &(w) = & (a, t) will be a random pro- 
cess (see Sect. 18.1) if {x : x(t) < u} € B for all f, uw. In that case (A3.6.1)-(A3.6.4) 
will refer to the weak convergence of the distributions of random processes which 
has already been studied in Chap. 20. 

In the metric space X, for any A € X, one can define its boundary 


dA =[A] — (A), 


where [A] is the closure of A, (A) being its interior ((A) = X — [A], where A is the 
complement of A). 


Definition A3.6.2 A set A is said to be a continuity set of the distribution P (or 
P-continuous set) if P(0A) = 0. We will denote the class of all P-continuous sets 
by Dp. 


The following criterion of weak convergence of distributions holds true. 


Theorem A3.6.1 The following four conditions are equivalent: 


(i) P, => P, 

Gi) limp soo P, (A) = P(A) for all Ae Dp, 
(iii) lim sup, _, 45 Pn(F) < P(F) for all closed F CX, 
(iv) liminfn—o Pn(G) = P(G) for all open G CX. 


Observe that if P, = P, then convergence (A3.6.1)-(A3.6.3) takes place for 
a wider class of functionals than C,(X) (C(%X)), namely, for the so-called P- 
continuous functionals (or functionals continuous with P-probability 1). We will 
call so the functionals f for which f(x,) > f(x) as p(x, x) > Onot for all x € X, 
but only for x € A, P(A) = 1. The class of P-continuous functionals will be denoted 
by Cp(X). 

The classes Dp and C p(X), and also the classes of all closed and open sets par- 
ticipating in Theorem A3.6.1, are very wide which makes verifying the conditions 
of Theorem A3.6.1 rather difficult and cumbersome. These classes can be substan- 
tially restricted if we consider not arbitrary but only relatively compact sequences 
P,, (from any subsequence P’, one can choose a convergent subsequence; this ap- 
proach was already used in Sect. 6.3). 


Definition A3.6.3 A class D of sets from % is said to determine the measure P if, 
for a measure Q, the equalities P(A) = Q(A) for all A € DDp imply Q=P. 


3.6 Convergence in Arbitrary Spaces 651 


A class D determines the measure P if D is an algebra and o(DDp) = Bx 
(condition o (D) = Bx is insufficient (see e.g. [4])). 

In a similar way we introduce the class F of functionals f determining the distri- 
bution P of arandom element é = &?: for any Q, the coincidence of the distributions 
of f(E”) and f(E2) for all f € FCp(X) implies P= Q. 


Theorem A3.6.2 A necessary and sufficient condition for convergence P, => P is 
that: 


(1) the sequence P,, is relatively compact; and 
(2) there exists a class of sets D C Sx determining the measure P and such that 
P,,(A) > P(A) for any AE DDp. 


An alternative to condition (2) is the existence of a class of functionals F which 
determines the measure P and is such that P(f(&,) < t) > P(f(é) < ¢) for all 
f €FCp(X). 

The following notion of tightness plays an important role in establishing the com- 
pactness of {P,,}. 


Definition A3.6.4 A family of distributions {P,,} on (2, 8) is said to be tight if, 
for any ¢ > 0, there exists a compact set K = K, C X such that P,,(K) > 1 — e for 
all n. 


Theorem A3.6.3 (Prokhorov) /f {P,} is a tight family of distributions then it is 
relatively compact. If X is a complete separable space, the converse assertion is 
also true. 


Since, for many functional spaces (in particular, for spaces C(0, T) and D(0, T)), 
there exist simple explicit criteria for compactness of sets, one can now establish 
conditions ensuring convergence P,, => P in these spaces. It is well known, for ex- 
ample, that in the above-mentioned spaces compacta are, roughly speaking, of the 
form {x : wa(x) < €(A)}, where wa(x) is the so-called “modulus of continuity” 
(in the space C or D, respectively) of the element x, and e(A) > 0 is an arbitrary 
function vanishing as A | 0. 

The proofs of Theorems A3.6.1—A3.6.3 can be found, for example, in [1]. We do 
not present them here as they are somewhat beyond the scope of this book and, on 
the other hand, the theorems themselves are not used in the body of the text. We use 
only the special cases of these theorems given in Sects. 6.2 and 6.3. 

The invariance principle of Sect. 20.1 is a theorem about weak convergence of 
distributions in the space C(0, 1). In order to use Theorems A3.6.2 and A3.6.3 to 
prove this result, one has to choose the class D to be the class of cylinder sets. 
Convergence of P,, to P on sets from this class D is the convergence of finite- 
dimensional distributions of processes s,(t) generated by sums of random vari- 
ables (see Sect. 20.1). Since the increments of s,(t) are essentially independent, 
the demonstration of that part of the theorem reduces to proving asymptotic normal- 
ity of these increments, which follows immediately from the central limit theorem. 


652 3 Elements of Measure Theory and Integration 


The condition of compactness of the family of distributions in C(O, 1) requires, ac- 
cording to Theorem A3.6.3, a proof that the modulus of continuity of the trajectory 
Sn(t) converges to zero in probability (for more details, see e.g. [1]). This could be 
proved using the Kolmogorov inequality from Corollary 11.2.1. 


3.6.2 Convergence in Total Variation 


So, to consider weak convergence of distributions in spaces (X, 8) of a general 
nature, one has to introduce a topology in the space, which is not always convenient 
and feasible. There exists another type of convergence of distributions on (X, 8) 
which does not require the introduction of topologies. This is convergence in total 
variation. 


Definition A3.6.5 Let y be a finite signed measure on (X, %8). The total variation 
of y (or the total variation norm ||y ||) is the quantity 


ly || = sup [[ renavc] (A3.6.5) 


fifi 


where the supremum is taken over the class of all S$-measurable functions f(x) 
such that | f(x)| < 1 for all x € X. 


The supremum in (A3.6.5) is clearly attained on such functions f for which, 
roughly speaking, f(x) = 1 at points x such that dy(x) > 0, and f(x) = —I at 
points x for which dy (x) < 0. Therefore (A3.6.5) can be written in the form 


Ivll= i: dyn]. (A3.6.6) 


An exact meaning to this expression can be given using the Hahn decomposition 
theorem (see Corollary A3.5.1), which implies 


Iv l= yt (X) + y7 (0). (A3.6.7) 
The right-hand side of this equality may be taken as a definition of {| dy(x)]. 
Lemma A3.6.2 If y(X) =0, then ||y || = 2 sup ges; y (B). 


Proof From (A3.6.5) it follows that, for any B (B is the complement of B, y(B) U 
y(B)=0), 
lv ll = |y(B)| + |y B)| =2|y(B)]. 


Therefore || yl] = 2sup peas |y(B)I. 


3.6 Convergence in Arbitrary Spaces 653 


To obtain the converse inequality, we will make use of Corollary A3.5.1 of the 
Hahn decomposition theorem. As we have already noted, according to that theorem 
(for the definition of the set D* see the Hahn theorem), 


lvl =yt QO +7) = yt (Dt) +7 (DF) 


= y(D*) — y(D*) =2y(D*) <2 sup y(B). 
BeS 


The lemma is proved. 


Definition A3.6.6 Let P be a distribution and P,,, n = 1,2,..., a sequence of dis- 
tributions given on (X, 8). We will say that P,, converges to P in total variation: 


P, —> P, if ||P, —Pl| > 0asn— oo. 


Convergence in total variation is a very strong form of convergence. If (X, 8) is 


a metric space and P,, a P, then P,, = P. Indeed, since any functional f € Cp, (X) 
is bounded: | f (x)| < b, we have 


| [ rar, ~ dP) 


< bf |d@, — P)| =d||P, — Pll > 0. 


[rar [rar 


even without assuming the continuity of f. 


Thus in that case 


The converse assertion about convergence P,, JX, P if P, => P is not true. 
Let, for example, X = [0,1], P, be the uniform distribution on the set of n + 1 
points {0, 1/n,...,n/n}, and P = Uo,1. It is clear that all P,, are concentrated on 
the countable set N of all rational numbers. Therefore P,,(N) = 1, P(N) = 0, and 
||P, — Pll =P.) + P(X \ 'N) =2. At the same time, clearly P,, > P. 

Now let the distribution P have a density p with respect to a measure mw (one 
could take, in particular, 4 = P, in which case p(x) = 1). Denote by p, the density 
(with respect to yw) of the absolutely continuous (with respect to w) component P? 
of the distribution P,,. 


. = TV : 
Theorem A3.6.4 A necessary and sufficient condition for convergence P, —> P is 
that Py converges to p in measure jh, i.e., for any € > 0, 


n{x: | pre) — pr) >e}>0 asn—> ow. 


Proof We have 


flee, -») = | lent) = pe|ucasy + Pal. 


where P* is the singular component of P,, with respect to the measure pL. 


654 3 Elements of Measure Theory and Integration 

Let ||P,, — P|| > 0. Then 

/ [Pn — pldu > 0, (A3.6.8) 
and hence 
u{x:|pn(x) — p(x)| >e} se! i; lpn — pldu > 0. 
Now let py mae p. Put 
Be= (exp) 2s}, An,e = {x :|Pn(x) — p(x)| <e?}. 

Then 


1 
i> [ pdw> ep(B.), (Be) < 
B- 


[i> = plan =f +f : (A3.6.9) 
ec Ane e Ane 


Here the first integral on the right-hand side does not exceed ¢. Since 


Consider 


lim pdp— l, 
Be 


e>0 


we will have, for a given 6 > 0 and sufficiently small ¢, the inequality 


i pdp>i-—s 
Be 


and, for n large enough, 
i pdpe>1—26, / Pndp > 1-36. (A3.6.10) 
Be An,e BnAn,e 


It follows from these two inequalities that the second integral in (A3.6.9) does not 
exceed 56, which proves (A3.6.8). Furthermore, (A3.6.10) implies that ||P4|| > 1 — 
36 and ||P || < 36. The theorem is proved. 


The theorem implies that if P,, Bae P then the absolutely continuous with respect 


to # = P component P% of the distribution P,, has a density p,(x) a 1,P0(X) > 1. 


Appendix 4 
The Helly and Arzela—Ascoli Theorems 


In this appendix we will prove Helly’s theorem and the Arzela—Ascoli theorem. The 
former theorem was used in Sect. 6.3, and both theorems will be used in the proof 
of the main theorem of Appendix 9. 

Let F be the class of all distribution functions, and G the class of functions G 
possessing properties Fl and F2 from Sect. 3.2 (monotonicity and left continuity), 
and the properties G(—oo) > 0 and G(oo) < 1. We will write G, = G as n > 00, 
Ge, if G,(x) > G(x) at all points of continuity of the function G. 


Theorem A4.1 (Helly) Any sequence F, € F contains a convergent subsequence 
Fan > F & S. 


We will need the following. 
Lemma A4.1 A sufficient condition for convergence F, => F € G is that 
F(x) > F(x), xeD, 
as n —> co on some everywhere dense set D of the reals. 


Proof Let x be an arbitrary point of continuity of F(x). For arbitrary x’, x’ € D 
such that x’ < x < x”, one has 


F, (x) < Fu(x) < F(x”). 
Consequently, 
slim, F(x’) < limint F(x) < timsup Fy (x) < lim, Fn(x"), 
From here and the conditions of the lemma we obtain 


F(x’) < liminf F, (x) < lim sup F,(x) < F(x”). 
noo n—>oo 


A.A. Borovkov, Probability Theory, Universitext, 655 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


656 4 The Helly and Arzela—Ascoli Theorems 


Letting x’ ¢ x and x” | x along the set D and taking into account that x is a point 
of continuity of F’, we get 


lim F,,(x) = F(x). 
noo 


The lemma is proved. 


Proof of Theorem A4.1 Let D = {x,} be an arbitrary countable everywhere dense 
set of real numbers. The numerical sequence {F;,(x;)} is bounded and hence con- 
tains a convergent sequence { F1,(x1)}. Denote the limit of this sequence by F(x). 
Consider now the numerical sequence { Fj, (x2)}. It also contains a convergent sub- 
sequence {F,,(x2)} with a limit F (x2). Moreover, 


lim F,(%1) = F(x). 
n—->oo 


Continuing this process, we will obtain, for any number k, k sequences 
{Fatnl; t= Yeu k 


such that lim). 95 Fen (xj) = F(x;). 

Consider the diagonal sequence of the distribution functions {Fy (x)}. For any 
xz € D, only k — 1 first elements of the numerical sequence {Fy,,(x,)} may not 
belong to the sequence Fx, (xx). Therefore 


lim) Frn (xx) = F (xg). 
n—>0o 


It is clear that F(x) is a non-decreasing bounded function given on D. It can 
easily be extended by continuity from the left to a non-decreasing function on the 
whole real line. Now we see that the sequence {F;,,,} and the function F satisfy the 
conditions of Lemma A4.1. The theorem is proved. 


The conditions of Helly’s theorem can be weakened. Namely, instead of F we 
could consider a wider class J{ of non-decreasing left continuous (i.e., satisfy- 
ing properties Fl and F3) functions H majorised by a fixed function: for any x, 
|H(x)| < N(x) < oo, where N is a given function characterising the class F(. We 
do not exclude the case when | H (x)| (or N(x)) grow unboundedly as |x| — oo. The 
following generalised version of Helly’s theorem is true. 


Theorem A4.2 (Generalised Helly theorem) Any sequence Hy, € F contains a sub- 
sequence Hy» which converges to a function H € H at each point of continuity 
of H. 


The Proof repeats the above proof of Helly’s theorem. 


To each function H, € }{ we can associate a measure mf, by putting 


H,([a, b)) = Hn(b) — Hn (a). 


4 The Helly and Arzela—Ascoli Theorems 657 


The generalised Helly theorem will then mean that, for any sequence of measures jt, 
generated by functions from +, there exists a subsequence p,,, converging weakly 
on each finite interval of which the endpoints are not atoms of the limiting mea- 
sure Ly. 

We give one more analogue of Helly’s theorem which refers to a collection of 
equicontinuous functions g,. Recall that a sequence of functions {g,} is said to be 
equicontinuous if, for any ¢ > 0, there exists a 6 > 0 such that |x; — x2| < 6 implies 
18n(%1) — 8n(X2)| < € forall n. 


Theorem A4.3 (Arzela—Ascoli) Let {gn} be a sequence of uniformly bounded and 
equicontinuous functions of a real variable. Then there exists a subsequence gn, 
converging to a continuous limit g uniformly on each finite interval. 


Proof Choose again a countable everywhere dense subset {x,} of the real line, and 
a subsequence {gy,} converging at the points x), x2,... Denote its limit at the point 
x; by g(x;). We have 


| Sng (x) = 8n, (x)| < | gaz () = nx (x{)| + | gn, (x) — 8n, (xj)| 
+ |@ny (Xj) — 8n, (xy)]- (A4.1) 


By assumption, the last term on the right-hand side tends to 0 as ny — 00, n; > ©. 
By virtue of equicontinuity, for any point x there exists a point x; such that, for 
all n, 


[gn (x) — Bn(ay)| <e. (A4.2) 
In any given finite interval J there exists a finite collection of points x; such that 
(A4.2) will hold for all points x; ¢ 7. This implies that the right-hand side of (A4.1) 
will be less than 3¢ for all sufficiently large nx, n, uniformly over x; € J. Thus there 


exists the limit g(x) = lim gy, (x), for which by (A4.2) we have |g(x) — g(xj)| <6, 
which implies that g is continuous. The theorem is proved. 


Appendix 5 
The Proof of the Berry—Esseen Theorem 


In this appendix we prove the following assertion stated in Sect. 8.5. 


Theorem A5.1 (Berry—Esseen) Let & be independent identically distributed ran- 
dom variables, 


n 
S; 
E&=0, Var&)=1, w=El&l? <0, = Sn= > o&, n= 
k=1 - 


Then, for all n, 
cu 
An := sup |P(f; < x) — ®(x)| < —, 
n uP | n | Ji 
where ® is the standard normal distribution function and c is an absolute constant. 


Proof We will make use of the composition method. As in Sect. 8.5, we will bound 
A, based on estimates of smallness of Eg(¢,) — Eg(n), n € ®o,1, for smooth g. 
To get a bound for A, in Sect. 8.5, we chose g to be a function constant outside a 
small interval. The next lemma shows that such a choice is not obligatory. Let G be 
a distribution function and y & G be independent of ¢, and 7. Put 


g(z) =o(* =), 
E 


_ eel 2—S\ — x—tn\ _ 
Eg (fn) =EG A =Pl\y< z =P(, + ey <x), 


so that 


Eg(y) =P( + ey <x). 


x= x— 
éE é 
A.A. Borovkov, Probability Theory, Universitext, 659 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


Set 


An, °= sup 
x 


660 5 The Proof of the Berry—Esseen Theorem 


= sup|P(¢, + ey <x) -—P(n+ey <x)| 
x 


= sup [ecoPe, <x —ey)—P(n <x-—ey)]). 


Clearly, A;,,-¢ < A,. Our aim will be to obtain a converse inequality for A,. 


Lemma A5.1 Let v > 0 be such that G(v) — G(—v) = 3/4. Then, for any ¢ > 0, 


3ve 
An < 2An ee + Vin 


Proof Assume that the sup, in the definition of A, is attained on a positive value 
An (x) := Fy(x) — ®(x) (the case of a negative value A, (x) is similar) and that, for 
a given 6 > 0, the value xs is such that 


An (x5) = Fn(xs) — (x3) = An — 4, 


where F,, is the distribution function of ¢,. When the argument increases, the value 
of A, (xs) varies little in the following sense. Let |y| < v. Then v — y > 0 and 


An(xs + €(v — y)) = Fn(xs + e(v — y)) — ®(xs + e(v — y)) 
> Fy (xs) — ®(xs) — [® (xs + e(v — y)) — ©(a5)]. 


Here the difference in the brackets does not exceed e(v — y)®/(0) < 2ve//27, and 
hence 


2ve 


Vin 


An(xs + €(v — y)) = An —8— 


Therefore 


Ais [ae rantas +ev-ey)= f +f 
ly|<v ly|=u 


ae , 5 2ve Loe 3 54 2ve 
—4 n Jon 4 aa 4 fon . 


Since 6 is arbitrary, the assertion of the lemma follows. 


Corollary A5.1 For G = ®(y € ®o,1) the value v = 6/5 satisfies the condition of 
Lemma AS5.1, and 


An < 2(An,e +). (A5.1) 


At the next stage of the proof we bound A,,,¢, and it is at that stage where the 
composition method will be used. Put 


k 
u(n) := max Ay —, a := en. 
k<n [LL 


5 The Proof of the Berry—Esseen Theorem 661 


By letters c (with or without indices) we will denote absolute constants, not neces- 
sarily the same ones. 


Lemma A5.2 Fora > 1, 


Ane <eu( + aS ). (A5.2) 


vn a/n 


Proof Set Hy := aa 17k, Where nx & ®o,; are independent of each other and of 
H,, and y. The composition method is based on the following identity (cf. Theo- 
rem 8.5.1 and identity (8.5.3), 1 € ®o,1): 


P(g, +ey <x) —P(nt+ey <x) =P(S, tay <xJn) — P(A, tay <xJ/n) 


= S>[P(Sn—1 + (An — Hm) + &m +ay < x/n) 


m=1 


—P(Sm-1 + (An — Am) +m + ay <x/n)]. 


Since for y € ®o; one has Hy, — Hm + ay € ®o ,_m+,2> the last sum is equal to 
n=1 Dm, where 


Dn =E[o (oes) o(SA tein) 
dn dn 


= ) Nm 
= E/ (7, | T, : 
( "dm mdm 


_ x./n — Sm-1 
= a . 


ae =n—m+a’, Ta? 
To bound D,, we will adopt the same approach as in Lemma 8.5.1. Because the 


first two moments of &,, and 7, coincide, expanding @ into a series yields 


2u 
|Dmnl S Fy sup Ed” (Tn +1), 
t 


3 

m 

where $(x) = ®’(x) and @” = &””. Since the function $” is bounded, 

(A5.3) 
We will also need another bound for D,,. To obtain it, consider the quantity 


Rn i= sup|Ed” (Tin + t)| 
t 


< sup|E[6" (In +t) — ¢” (Vn + t)]| + sup|E¢” (Vin + 0)], (A5.4) 
t t 


662 5 The Proof of the Berry—Esseen Theorem 


where V,, is defined in the same way as T,, but with S,,_; replaced by H,,_1. 
Integrating by parts yields 


|E[6” (Tn +1) — 6" (Vn +1)]| = | [ow +1) d[P(Iin <u) —P(Vin < 0] 


= | [ow +1)[P(Tn <u) —P(Vn <u)] du 


< Am-1 fie) du=cAm-1, 


since |P(T, < u) — P(Vin < u)| < Am—1 (the variables 7,, and V,, are obtained 
from S,—1//m — 1 and Hy,—1//m — 1, respectively, by one and the same linear 
transformation). 

To bound the second summand on the right-hand side of (A5.4), note that 


" ” i Uu— am 

E¢' (Vn +t) = | @ (u+t) — o| —— }du, (A5.5) 
lm ‘nm 
where 
Jn / m—1 

ay =X I 3 

” dine os n—-m+a? 
so that = 1 g(a ays is the density of V,, = fs Fina) m=1) . Integrating the right-hand 


side of (AS. 5) ire by parts, we obtain 


nf 4&— am 


Cc 
a 


1 
|E$"(Vn +)|= = 
Tin 


Thus, 


Rn <c¢( Am—-1+ = Din < Aa : 
c = . <c : 
m m-1 r3 m ee a3 ( 1)3/2 


The bounds derived for D,, do not depend on x. Therefore, using the bound just 
obtained for m > n/2, and bound (A5.3) for m < n/2 (the latter bound implies then 
that |Dm| < cu/n/?), we get 


ane sen no/2 + 3 Am— Ly +y ane (A5.6) 


m<n/2 m>n/2 din m>n/2 


Here the first sum does not exceed (n/ 2)n—3/2 = 1 /(2,/n) and the last sum does 
not exceed 


” ds Cc 
32 = 
n/2-1 8 / Jn 


5 The Proof of the Berry—Esseen Theorem 663 


It remains to bound the middle sum. Setting (u(m) := maxz<, (A awk) /L), we have 


n 


Am-—1 1E " 1 
> 77 < pu(n —- 1) Fi > joe 


m>n/2  —™ m>n/2 


The last sum does not exceed 


3 1 <A+f? dt 1 3 
fag +.07)9/? ~ of 9 (t+a2)3/2 a3 2a ~ 2a’ 


provided that w > 1. Collecting (A5.6) and the above estimates together, we obtain 
the assertion of the lemma. 


We now turn directly to the proof of the theorem. By virtue of (A5.1) and (A5.2), 


aud 


2a 2cuu(n — 1 2a 
v(n) := —— << = fine + s2c+ si es 


Put here aw := max(4cy, 1). Then (yu > 1) 


u(n+ 1) 


< 
vin) Sey + 5 


This implies that u(m) < 2c; for all n. To verify this, we make use of induction. 
Clearly, u(1) = v(1) < 1 < 2c;. Let u(m — 1) < 2c). Then v(m) < 2c; and u(m) = 
max(v(n), u(m — 1)) < 2c;. The theorem is proved. 


Appendix 6 
The Basic Properties of Regularly Varying 
Functions and Subexponential Distributions 


The properties of regularly varying functions and subexponential distributions were 
used in Sects. 8.8, 9.4—9.6 and 12.7 and will be used in Appendices 7 and 8. 


6.1 General Properties of Regularly Varying Functions 


Definition A6.1.1 A positive measurable function L(t) is called a slowly varying 
function (s.v.f.) as t > oo if, for any fixed v > 0, 

L(t) 

L(t) 


>1 ast>o. (A6.1.1) 


A function V(t) is called a regularly varying function (r.v.f.) (with exponent 
—B €R) ast —> oo if it can be represented as 


V(t) =t FP Lt), (A6.1.2) 
where L(t) is an s.v.f. as t > oo. We will denote the class of all r.v.f.s by . 


The definitions of an s.v.f. and r.v.f. as t | 0 are quite similar. In what follows, 
the term s.v.f. (r.v.f.) will (unless specified otherwise) always refer to a slowly (reg- 
ularly) varying function at infinity. 

It is easy to see that, similarly to (A6.1.1), a characteristic property of regularly 
varying functions is the convergence, for any fixed v > 0, 


Viut) -B 
>Uu 
V(t) 


ast —> oo. (A6.1.3) 


Thus, an s.v.f. is an r.v.f. with exponent zero. 
Typical representatives of the class of s.v.f.s are the logarithmic function and its 
powers In” ft, y € R, their linear combinations, multiple logarithms, functions with 


A.A. Borovkov, Probability Theory, Universitext, 665 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


666 6 Regularly Varying Functions 
the property L(t) ~ L =const £0 as t > ~, etc. As an example of a bounded 


oscillating s.v.f. one can give 


Lo(t)=2+sindnInt), ¢>1. 
We will need the following two basic properties of s.v.f.s. 


Theorem A6.1.1 (Uniform convergence theorem) Jf L(t) is an s.vf. as t > co 
then convergence (A6.1.1) holds uniformly in v on any segment [v1, v2], 0 < vy < 
v2 < Oo. 


The theorem implies that the uniform convergence (A6.1.1) on the segment 
[1/M, M] also takes place in the case when, as t > oo, the quantity M = M(t) 
grows unboundedly slowly enough. 


Theorem A6.1.2 (Integral representation) A function L(t) is an s.vf. as t > oo if 
and only if, for some to > 0, one has 


“e(u) 
L(t) = c(t) exp —du;, t>to, (A6.1.4) 

to Uu 
where the functions c(t) and €(t) are measurable and such that c(t) > c € (0, oo) 


and €(t) > 0 ast > oo. 


For instance, for L(t) = Int representation (A6.1.4) is valid with c(t) = 1, to =e 
and e(t) = (Int)~!. 


Proof of Theorem A6.1.1 Put 
h(x) := In L(e*). (A6.1.5) 


Then property (A6.1.1) of s.v.f.s is equivalent, for each u € R, to the condition that 
the convergence 


h(x +u)—-—h(x) > 0 (A6.1.6) 


takes place as x — oo. To prove the theorem, we need to show that this convergence 
is uniform in u € [u1,u2] for any fixed u; € R. In order to do that, it suffices to 
verify that convergence (A6.1.6) is uniform on the segment [0, 1]. Indeed, from the 
obvious inequality 


[n(x + uy +uz) — h(x)| < [ae +1 +u2) — he +u1)| + |A@ + U1) — h@w)| 
(A6.1.7) 


we have 


|n(x +u) —A(x)| < (ug — ur +1) sup |h@wt+y)—hA@)|, we (ur, uel. 
ye(0, 1] 


6.1 General Properties of Regularly Varying Functions 667 
For a given e € (0, 1) and an x > 0, set J, :=[x,x + 2], 


IS = {ue l,: |h(u)—h(x)|>e/2}, Ih, = {ue lo: |A(w +4) —A()| = €/2}. 
Clearly, the sets /* and J> , are measurable and differ from each other by a transla- 
tion by x, so that wy) = MUG .)> where ym is the Lebesgue measure. By (A6.1.6) 
the indicator function of the set 10 x converges, at each point u € Ip, to0 as x > ov. 
Therefore, by the dominated convergence theorem, the integral of this function, be- 
ing equal to KU) converges to 0, so that w(/*) < e/2 for x > xo, where x9 is 
large enough. 

Further, for s € [0, 1], the segment 7,1 1.45 = [x+s,x +2] has length 2—s > 1, 
so that, for x > xo, the set 


Uy Nn Ix+45) \ ie U is) 


has measure > | — ¢ > O and hence is non-empty. Let y be a point from this set. 
Then 


[a(x +5) —A(x)| < |A(e +5) — h(y)| + |hQ) — A) < €/24+€/2=e 


for x > xo, which proves the required uniformity on [0, 1] and hence on any fixed 
segment. The theorem is proved. 


Proof of Theorem A6.1.2 The fact that the right-hand side of (A6.1.4) is an s.v.f. is 
almost obvious: for any fixed v 1, 


L(t) = c(t) exp [ “aul. (A6.1.8) 
L(t) c(t) t u 
where c(vt)/c(t) > c/c = 1 and, as t > oo, 
ut ut 
/ *O au=ol | “) =o(Inv) = o(\). (A6.1.9) 
t u t u 


We now prove that any s.v.f. admits the representation (A6.1.4). The required rep- 
resentation in terms of the function (A6.1.5) is equivalent (after substituting t = e*) 
to the relation 

x 
(xy =d(x) + | d(y) dy, (A6.1.10) 
x0 
where d(x) = Inc(e*) > d € R and 4(x) = e(e*) > 0 as x > &, x0 = Into. 
Therefore it suffices to establish representation (A6.1.10) for the function h(x). 

First of all note that h(x) (as well as L(t)) is a “locally bounded” function. In- 

deed, Theorem A6.1.1 implies that, for xo large enough and all x > xo, 


sup |h(x + y)—h(x)| <1. 
O<y<l 


668 6 Regularly Varying Functions 
Hence, for any x > xg, we have by virtue of (A6.1.7) the bound 
|n(x) — h(xo)| <x —x0 +1. 


Further, the local boundedness and measurability of the function / mean that it is 
locally integrable on [x9, 00) and hence can be represented for x > xg as 


xo+l 1 x 
nosy = | noray + [ (ix) —hex ty) dy + f (h(y + 1) — A(y)) dy. 


0) 0 
(A6.1.11) 
The first integral in (A6.1.11) is a constant, which will be denoted by d. The second 
integral, by virtue of Theorem A6.1.1, converges to zero as x —> oo, so that 


1 
a(x) =a + | (A(x) —h(@w + y)) dy > d, xX —> OO. 
0 


As for the third integral in (A6.1.11), by the definition of an s.v.f., the integrand 
satisfies 


d(y) :=h(y + 1)—-h(y) > 0 


as y — oo, which completes the proof of representation (A6.1.10). 


6.2 The Basic Asymptotic Properties 


In this section we will obtain a number of consequences of Theorems A6.1.1 and 
A6.1.2 that are related to the asymptotic behaviour of s.v.f.s and r.v.f.s. 


Theorem A6.2.1 (i) Jf L1 and Lo are s.vf.s then Lj + Lo, Ly L2, Le and L(t) := 
Li(at +b), where a> 0 and beéR, are also s.vf.s 
(ii) If L is an s.vf. then, for any 5 > 0, there exists a ts > 0 such that 


t®<L(t)<t®? forallt>ts. (A6.2.1) 


In other words, L(t) = t°" as t > oo. 
(iii) If L is an s.vf. then, for any 5 > 0 and vo > |, there exists a ts; > 0 such that, 
for all v = vo and t > ts, 
“ts Lqt) Jif 
= LO s 
(iv) (Karamata’s theorem) If an rvf. V in (A6.1.2) has exponent —B, B > 1, then 


Via): 7 V(u)du~ 4 ast > oo. (A6.2.3) 
t 


(A6.2.2) 


6.2 The Basic Asymptotic Properties 669 


If B <1 then 
t 
Vi(t) = V(u)du~ 280, as t > ©. (A6.2.4) 
0 1—-B 
If B =1 then 
Vi(t)=tV@Li(t) (A6.2.5) 
and 
Vi (t) =tV(t)L2(t) | V(u) du < ov, (A6.2.6) 
0 


where L(t) > wast w,i=1,2, are s.vfs. 
(v) For an rv. V with exponent —B <0, put 


b(t) := VOY (1/t) = influ: Vu) < 1/t}. 
Then b(t) is an rvf. with exponent 1/B: 
b(t) =t!/F Ly), (A6.2.7) 
where Ly is an s.vf. If the function L possesses the property 
L(tL'/?@)~L@ (A6.2.8) 
as t > oo then 
Lo(t) ~~ L/P (e'/8), (A6.2.9) 
Similar assertions hold for functions slowly/regularly varying as ¢ |, 0. 
Note that Theorem A6.1.1 and inequality (A6.2.2) imply the following property 


of s.v.f.s: for any 5 > 0 there exists a ts > 0 such that, for all t and v satisfying the 
inequalities t > ts and vt > ts, we have 
Lit 
(1 —8)min{v®, v*} < — <(1+6)max{v’, ov}. (A6.2.10) 
Proof of Theorem A6.2.1 Assertion (i) is evident (just note that, in order to prove 
the last part of (i), one needs Theorem A6.1.1). 
(ii) This property follows immediately from representation (A6.1.4) and the 


bound 
In 
fe nw au|=| | -o( [ ) +o( f ) = o(Int) 
10 u Int U 


as t > 00. 
(iii) In order to prove this property, notice that on the right-hand side of (A6.1.8), 
for any fixed 6 > 0 and vp > | and all ¢ large enough, we have 


re 


Int 


=8/2 < y8/? < c(t) <ul? < pil? 


= = < 


v v> v0, 


670 6 Regularly Varying Functions 


[ad 
t u 


(by virtue of (A6.1.9)). This implies (A6.2.2). 

(iv) By the dominated convergence theorem, we can choose an M = M(t) > co 
as t — oo such that the convergence in (A6.1.1) will be uniform in v € [1, M]. 
Changing the variable u = vt, we obtain 


lee) M lee) 
Vina zen | yb EOD av=r Po) | +f | (A6.2.11) 
1 Lit) 1 M 


and 


Shi 
=3 


If 6B > 1 then, ast > ~w, 


[ ~ fr Fdv> a, 


whereas by property (iii), for 6 = (6 — 1)/2, we have 


lee) [o.@) [o.@) 
<| vPdv= f py B+D/2 dy + 0. 
M M M 


These relations together imply 


tV(t) 
= — p-1 


The case 6 < | can be treated quite similarly, but taking into account the uniform 
in v €[1/M, 1] convergence in (A6.1.1) and the equality 


| 1 
/ v 8dy = —_. 
0 1—B 


If 6 = 1 then the first integral on the right-hand side of (A6.2.11) is 


M M 
/ ~|[ v 'dv=InM, 
1 1 


so that if 
lo) 
i V(u) du < oo (A6.2.12) 
0 
then 
Vit) > (1 + o(1))L(t)InM > L(t) (A6.2.13) 
and hence 


I I 
Lo(t):= Md © _ ae, >OCO asSt—> oo. 


tV(t) L(t) 


6.2 The Basic Asymptotic Properties 671 


Note now that, by property (i), the function L2 will be an s.v.f. whenever Vi (t) is 
an s.v.f. But, for v > 1, 


ut 
vi=vion + | V(u) du, 
t 


where the last integral clearly does not exceed (v — 1)L(t)(1 + 0(1)). By (A6.2.13) 
this implies that V/(vt)/V/(t) > 1 as t > oo, which completes the proof 
of (A6.2.6). 

That relation (A6.2.5) is true in the subcase when (A6.2.12) holds is almost ob- 
vious, since 


t oo 
VN =tVONL MD) =LOL= f vuydu [ V(u) du, 
0 0 


so that, firstly, L; is an s.v.f. by property (i) and, secondly, Lj(t) — oo because 
L(t) > 0 by (A6.2.13). 

Now let 6 = | and ie V(u) du = o. Then, as M = M(t) > oo slowly enough, 
similarly to (A6.2.11) and (A6.2.13), by the uniform convergence theorem we have 


1 1 
v= f vLendv> f v!L(vt)dv~ Lit) InM > L(t). 
0 1/M 


Therefore L,(t) := V;(t)/L(t) — oo as t —> oo. Further, also similarly to the 
above, we have, as v € (0, 1), 


t 


Vi(t) = Vp (ot) + / V(u) du, 


ut 


where the last integral does not exceed (1 — v)L(t)(1+0(1)) « V7(#), so that V7 (t) 
(as well as L(t) by virtue of property (i)) is an s.v.f. This completes the proof of 
property (iv). 

(v) Clearly, by the uniform convergence theorem the quantity b = b(f) is a solu- 
tion to the “asymptotic equation” 


1 
V(b) ~ ‘ as t > 0o (A6.2.14) 


(where the symbol ~ can be replaced by the equality sign if the function V is 
continuous and monotonically decreasing). Substituting t!/? L(t) for b, we obtain 
an equivalent relation 


LP L(t Ls) ~ 1, (A6.2.15) 
where clearly 


t/PL,—+0o ast oo. (A6.2.16) 


672 6 Regularly Varying Functions 


Fix an arbitrary v > 0. Substituting vt for tf in (A6.2.15) and setting, for brevity’s 
sake, Lz = L2(t) := Lp (vt), we get the relation 


LyP L(t L2) ~1, (A6.2.17) 


since L(v!/Fr'/BL5) ~ L(t!/PL2) by virtue of (A6.2.16) (with Ly replaced 
with L2). Now we will show by contradiction that (A6.2.15)-(A6.2.17) imply that 
Lp ~ L2 as t > &, which obviously means that Ly is an s.v.f. 

Indeed, the contrary assumption means that there exist vg > | and a sequence 
tn —> oo such that 


Un = La(t,)/Lo(t,) > v9, n=1,2,... (A6.2.18) 


(the possible alternative case can be dealt with in the same way). Clearly, t* := 


ty! ? Lp (tn) > 00 by (A6.2.16), so we obtain from (A6.2.15)—(A6.2.16) and property 
(iii) with 5 = 6/2 that 


pw Ea Cn) LAiy “Lath a yh En) BP cy BP oy, 
Len (tn) L (th Tati) Ltr 
We get a contradiction. 
Note that the above argument proves the uniqueness (up to asymptotic equiva- 
lence) of the solution to Eq. (A6.2.14). 
Finally, relation (A6.2.9) can be proved by a direct verification of (A6.2.14) for 
b= t!/B L 1/8 (1/8), using (A6.2.8), we have 


L(t VBL VB (t!/B)) EL th/B) oi 
tL(t!/B) tL(t'/B) t* 


V(b) =b-P L(b) = 


The required assertion follows now by the aforementioned uniqueness of the solu- 
tion to the asymptotic equation (A6.2.14). Theorem A6.2.1 is proved. 


6.3 The Asymptotic Properties of the Transforms of R.V.F.s 
(Abel-Type Theorems) 


For an r.v.f. V(t), its Laplace transform 


WA) := is e“ V(t) dt <0o 
0 


is defined for all A > 0. The following asymptotic relations hold true for the trans- 
form. 


Theorem A6.3.1 Assume that V(t) € KR (i.e. V(t) has the form (A6.1.2)). 


6.3 The Asymptotic Properties of the Transforms of R.V.F.s 673 


(i) Jf B € [0, 1) then 


rd —Bs) 
WA) ~ — 5—V(I/A) asi 1 0. (A6.3.1) 


(ii) If B =1 and [5° V(t) dt = 00 then 
WA)~Vr(/A) asa JO, (A6.3.2) 


where V7(t) = i V(u) du > oo isan s.vf. such that V;(t) >> L(t) as t > oo. 
(iii) In any case, W(A) t+ Vi (oo) = he Vit)dt<ooasi 0. 


Assertions (i) and (ii) are called Abelian theorems. 
If we resolve relation (A6.3.1) for V then we obtain 


w/t) 


VOM~ ast > oo. 
tP(1— B) 
Relations of this type will also be valid in the case when, instead of the regularity 
of the function V, one requires the monotonicity of V and assumes that w(A) is an 
r.v.f. as A | 0. Statements of such type are called Tauberian theorems. We will not 
need these theorems and so will not dwell on them. 


Proof of Theorem A6.3.1 (i) For any fixed ¢ > 0 we have 


e/r oo 
woy= fo +f 
0 e/r 


where, for the first integral on the right-hand side, for B < 1, by virtue of (A6.2.4) 
we have the following relation 


e/r e/a 
; e*vaars | V(tdt~ ee as A | 0. wee) 


Changing the variable At = u, we can rewrite the second integral in the above rep- 
resentation for y(A) as 


oo ~ oe BECUIA) = Yay f° +f} 
[= a (A6.3.4) 


Each of the integrals on the right-hand side converges, as A | 0, to the corresponding 
integral of e~“u~*: the former integral converges by the uniform convergence the- 
orem (the convergence L(u/A)/L(1/A) > 1 is uniform in u € [e, 2]), and the latter 
converges by virtue of (A6.1.1) and the dominated convergence theorem, since by 
Theorem A6.2.1(iii), for all A small enough, we have L(u/A)/L(./A) <u foru > 2. 


Therefore, 

oe V/A) £% 

/ ~ a“ | uPe" du. (A6.3.5) 
e/r A € 


674 6 Regularly Varying Functions 


Now note that, as A | 0, 


sey ee ep EY fe 
=E€E =e a 
re a L(/A) 


Since ¢ > 0 can be chosen arbitrarily small, this relation together with (A6.3.3) and 
(A6.3.5) completes the proof of (A6.3.1). 

(ii) Integrating by parts and changing the variable At = u, we obtain, for 6 = 1 
and M > 0, that 


WA)= i: e “dVi(t)h=— i Vi(t)de™ 
0 


0 
oo 1/M M oo 
=| Vi(u/ayetdu = f +f +f ; (A6.3.6) 
0 0 1/M M 


By Theorem A6.2.1(iv), V7(t) >> L(f) is an s.v.f. as t + oo. Therefore, by the 
uniform convergence theorem, for M = M(A) > o slowly enough as A — 0, the 
middle integral on the right-hand side of (A6.3.6) is 


M Vi(u/A) 
wm Vi(1/A) 
The remaining two integrals are negligibly small: since V7 (t) is an increasing func- 


tion, the first integral does not exceed V;(1/AM)/M = o(V;(1/A)), while for the 
last integral we have by Theorem A6.2.1 (iii) that 


M 
Vi(1/A) edu ~ vic f e “du ~ V;(1/A). 
1/M 


© Vi(u/d) _, Oo 
viC1/2) Vi(1/a)° du = Via) | ue du =o0(V,(1/A)). 


Hence (ii) is proved. Assertion (iii) is evident. 


6.4 Subexponential Distributions and Their Properties 


Let €, €,, &,... be independent identically distributed random variables with distri- 
bution F, and let the right tail of this distribution 


Fy) :=F([t,00)) =PE>N, 1ER, 


be an r.v.f. as t > oo of the form (A6.1.2), which we will denote by V(t). Recall 
that we denoted the class of all such distributions by 8. 

In this section we will introduce one more class of distributions, which is sub- 
stantially wider than R. 

Let ¢ € R be a random variable with distribution G: G(B) = P(¢ € B) for any 
Borel set B (recall that in this case we write € € G). Denote by G(t) the right tail 
of the distribution of the random variable ¢: 


G(t):=P(¢ =f), teR. 


6.4 Subexponential Distributions and Their Properties 675 


The convolution of tails G;(t) and G2(f) is the function 
Gix G20) =~ [ GO -yydGa(y)= [ Git — Gal) = P= 0) 


where Z2 = ¢; + € is the sum of independent random variables ¢; € G;, i = 1, 2. 
Clearly, G; * Go(t) = G2 * G;(t). Denote by G** (t) := G * G(t) the convolution 
of the tail G(t) with itself and put G"+)*(t) := G * G™(t), n > 2. 


Definition A6.4.1 A distribution G on [0, oo) belongs to the class 54 of subexpo- 
nential distributions on the positive half-line if 


G*(t)~2G(t) ast — oo. (A6.4.1) 


A distribution G on the whole line (—0«, 00) belongs to the class & of subexponen- 
tial distributions if the distribution G* of the positive part ¢* = max{0, ¢} of the 
random variable ¢ € G belongs to 84. A random variable is called subexponential 
if its distribution is subexponential. 


As we will see below (Theorem A6.4.3), the subexponentiality property of a dis- 
tribution G is essentially the property of the asymptotics of the tail G(t) as t > oo. 
Therefore we can also speak about subexponential functions. 

A nondecreasing function Gj(t) on (0, 00) is called subexponential if a distri- 
bution G with the tail G(t) ~ cG,(t) as t > oo with some c > 0 is subexpo- 
nential. (For example, distributions with the tails G(t) = G1 (t)/G1 (0) or G(t) = 
min(1, Gi(t))). 


Remark A6.4.1 Since we obviously always have 
2 
(GTO =PCF + oF 21) 2 P(e = Ufo" = 4}) 
=P(Q) >t) + PO. >t) -—P(O1 = 8, fo = 2) 
= 2G(t) — G(t) = 2G*(t)(1 + 0(1)) 
as t > oo, subexponentiality is equivalent to the following property: 


(Gt)*(t) 


Gta = (A6.4.2) 


lim sup 
too 
Note also that, since relation (A6.4.1) makes sense only when G(t) > 0 for all t € R, 
the support of any subexponential distribution is unbounded from the right. 


We show that regularly varying distributions are subexponential, i.e., that R C 8S. 
Let F € ® and P(é > t) = V(t) be r.v.f.s. We need to show that 


P(E) + & > x) = V*(x) = Ve V(x) 


=-[ V(x —t)dV(t) ~ 2V(x). (A6.4.3) 


676 6 Regularly Varying Functions 


In order to do that, we introduce events A := {& + & > x} and B; := {& < x/2}, 
i = 1, 2. Clearly, 


P(A) = P(AB)) + P(AB2) — P(AB Bo) + P(AB, Bo), 


where P(AB; Bp) = 0, P(AB, Bz) = P(B, B2) = V?(x/2) (here and in what fol- 
lows, B denotes the event complementary to B) and 


x/2 
P(AB)) = P(AB) = V(x — 1) F(dt). 
Therefore 
x/2 
VR Gy=3 i V(x —t) F(dt) + V7(x/2). (A6.4.4) 


(The same result can be obtained by integrating the convolution in (A6.4.3) by 
parts.) It remains to note that V(x /2) =o0(V(x)) and 


x/2 -—M M x/2 
i: V(x —t) F(dt) = +f +f ; (A6.4.5) 
—~oo —oo —M M 


where, as one can easily see, for any M = M(x) > oc as x > & such that M = 


o(x), we have 
—M 
I. ~ V(x) and [. +f = o(V(x)), 


which proves (A6.4.3). 
The same argument is valid for distributions with a right tail of the form 


FiQy=e"2©, geo,t), (A6.4.6) 


where L(t) is an s.v.f. as t + o© satisfying a certain smoothness condition (for 
instance, that L is differentiable with L’(t) = o(L(t)/t) as t > 00). 

One of the basic properties of subexponential distributions G is that their tails 
G(t) are asymptotically locally constant in the following sense. 


Definition A6.4.2 We will call a function G(t) > 0 (asymptotically) locally con- 
stant (1.c.) if, for any fixed v, 


Git +v) 


GW >1 ast> om. (A6.4.7) 


In the literature, distributions with l.c. tails are often referred to as long-tailed 
distributions; however, we feel that the term “locally constant function” better re- 
flects the meaning of the concept. Denote the class of all distributions G with lL.c. 
tails G(t) by £. 

For future reference, we will state the basic properties of l.c. functions as a sepa- 
rate theorem. 


6.4 Subexponential Distributions and Their Properties 677 


Theorem A6.4.1 (i) For an l.c. function G(t) the convergence in (A6.4.7) is uni- 
form in v on any fixed finite interval. 

(ii) A function G(t) is L.c. ifand only if, for some to > 0, it admits a representation 
of the form 


t 
G(t) = c(t) exp| if e(u) au} t > 10, (A6.4.8) 


Li) 
where the functions c(t) and €(t) are measurable and such that c(t) > c € (0, 0) 
and «(t) > Oas t > oo. 
(iii) If G\ (t) and G2(t) are l.c. functions then G\(t)+ Gr(t), Gi (t)G2(t), G? (t), 
and G(t) := G,(at +b), where a> 0 and b €R, are also L.c. 
(iv) If G(t) is an l.c. function then, for any ¢ > 0, 


e'G(t)> 00 ast> oO. 
In other words, any l.c. function G(t) can be represented as 
Ga)=e', I(t)=o0(t) ast ov. (A6.4.9) 
(v) Let 
CO 
Glin [ G(u) du < 0% 
t 


and at least one of the following conditions be satisfied: 


(a) G(t) is an l.c. function; or 
(b) G'(t) is an Lc. function and G(t) is monotone. 


Then 
G(t)=0(G'(t)) ast ov. (A6.4.10) 
(vi) IfG € £ then G?*(t) ~ (G*)**(t) as t > o0. 


Remark A6.4.2 Assertion (i) of the theorem implies that the uniform convergence 
in (A6.4.7) on the interval [—M, M] persists in the case when, as t — oo, M = M(t) 
grows unboundedly slowly enough. 


Proof of Theorem A6.4.1, (i)-(iii) It is clear from Definitions A6.4.1 and A6.4.2 
that G(t) is l.c. if and only if L(t) := G(n?) is an s.v.f. Having made this observa- 
tion, assertion (i) follows directly from Theorem A6.1.1 (on uniform convergence 
of s.v.f.s), while assertions (ii) and (iii) follow from Theorems A6.1.2 and A6.2.1(4), 
respectively. 

Assertion (iv) follows from the integral representation (A6.4.8). 

(v) If (a) holds then, for any M > 0 and all ¢ large enough, 


t+M 1 
G'(t) >| G(u) du > 5MGO). 
t 


678 6 Regularly Varying Functions 
Since M is arbitrary, G! (t) >> G(t). Further, if (b) holds then 


G(t) i G!(t—1) 
Glin <atw |, 4 re il 


as t > co. 
(vi) Let ¢; and ) be independent copies of a random variable ¢, Zz := ¢) + 2, 
a = a + oe Clearly, ¢ < cr so that 
G**(t) =P(Z2 >t) < P(ZSP > = (Gt)*@. (A6.4.11) 
On the other hand, for any M > 0, 


2 
G*(t) > P(Z2 >t, 1 > 0, 42 > 0) + 9) P(Z2 >t, 6 €[-M, 01), 
i=1 


where the first term on the right-hand side is equal to P(ZS =? c > 0, a >0), 
and the last two terms can be bounded as follows: since G € £, then, for any e > 0 
and M and t large enough, 
P(Z2 >t, 61 €[—M, 0]) > P(& >t+M, o1 €[—M,0]) 
G(it+ M) 
= COE PG: <0) — P(t < —M)] 


> (1—2)G(@)P(¢* =0) = (1 — 2) P(ZSP >t, cf =0). 


Thus we obtain for G**(t) the lower bound 


2 
G* (It) >P(ZSP >t, 6 > 0,67 > 0) + —2) S P(ZS? >t, g* =0) 
i=l 


> (1—e)P(Z$? > 1) =(1—2)(G*)"@. 


Therefore (vi) is proved, as ¢ can be arbitrarily small. The theorem is proved. 


We return now to our discussion of subexponential distributions. First of all, we 
turn to the relationship between the classes S and L. 


Theorem A6.4.2 We have § C £, and hence all the assertions of Theorem A6.4.1 
are valid for subexponential distributions as well. 


Remark A6.4.3 The coinage of the term “subexponential distribution” was appar- 
ently due mostly to the fact that the tail of such a distribution decreases as t > oo 
slower than any exponential function e~*’, as shown in Theorems A6.4.1(iv) and 
A6.4.2. 


6.4 Subexponential Distributions and Their Properties 679 


Remark A6.4.4 In the case when the distribution G is not concentrated on [0, oo), 
the tails’ additivity condition (A6.4.1) alone is not sufficient for the function G(ft) 
to be l.c. (and hence for ensuring the “subexponential decay” of the distribution tail, 
cf. Remark A6.4.3). This explains the necessity of defining subexponentiality in the 
general case in terms of condition (A6.4.1) on the distribution G* of the random 
variable ¢+. Actually, as we will see below (Corollary A6.4.1), the subexponential- 
ity of a distribution G on R is equivalent to the combination of conditions (A6.4.1) 
(on G itself) and Ge £. 

The next example shows that, for random variables assuming values of both 
signs, condition (A6.4.1), generally speaking, does not imply the subexponential 
behaviour of G(f). 


Example A6.4.1 Let 2 > 0 be fixed and the right tail of the distribution G have the 
form 


Gi=e VC), (A6.4.12) 


where V(t) is an r.v.f. vanishing as t + oo and such that 
Co 

g(u)i= / eG(dy) < 00. 
—o0o 


Similarly to (A6.4.4) and (A6.4.5), we have 


1/2 
G1) =2 i Git — y) Gay) + G20t/2), 
where 


t/2 t/2 
[. G(t — y)G(dy) =e“ [. eV (t — y) G(dy) 


eel ae 


One can easily see that, for M = M(t) — oo slowly enough as t > ov, we have 


M 
[erve — y) Gy) ~ g(u) V(t), [- +f" = o(Gi(t)), 


while 
G?(t/2) =e M V7 (t/2) < ce MV" (t) = 0(G@). 
Thus, we obtain 
G*(t) ~ 2g(wye MV (t) = 28 (u)G(t), (A6.4.13) 


and it is clear that we can always find a distribution G (with a negative mean) such 
that g(j) = 1. In that case relation (A6.4.1) from the definition of subexponentiality 


680 6 Regularly Varying Functions 


will be satisfied, although G(t) decreases exponentially fast and hence is not an l.c. 
function. 

On the other hand, note that the class of distributions satisfying relation (A6.4.1) 
only is an extension of the class 8. Distributions in the former class possess many 
of the properties of distributions from 8. 


Proof of Theorem A6.4.2 We have to prove that § C £. Since the definitions of both 
classes are given in terms of the right distribution tails, we can assume without loss 
of generality, that G € $+ (or just consider the distribution G*). For independent 
(nonnegative) ¢; € G we have, for t > 0, 


G*(th=PO+o>)=PQ>04+P(+o>40 <1) 
t 
= cm f G(t — y) G(dy). (A6.4.14) 
0 


Since G(f) is non-increasing and G(0) = 1, it follows that, for t > v > 0, 


G**(t) i, G(t — y) [ G(t— y) 
=1 —— G(d ———— G(d 
Ge eG. "| eG. 


G(t —v) 
G(t) 


Therefore, for ¢ large enough (such that G(v) — G(t) > 0), 


G(t —v) 1 Ee 
< < 
— GO) ~ Gw—-Gi@| Git) 


>1+[1-G(v)]+ [G(v) — G(n)]. 


2+ cw) 


Since G € 84, the right-hand side of the last formula converges as t > oo to the 
quantity G(v)/G(v) = 1 and hence G € £. The theorem is proved. 


The next theorem contains several important properties of subexponential distri- 
butions. 


Theorem A6.4.3 Let Ge 8. 
G) If G,(t)/G(t) > cj as t > 00, ¢; = 0, i = 1, 2, cy +c2 > 0, then 


G1 * G2(t) ~ Gilt) + Galt) ~ (c1 + 2) G(r). 


(ii) If Go(t) ~ cG(t) as t > co, c > 0, then Go € S. 
(iii) For any fixed n > 2, 


G"*(t)~nG(t) ast oo. (A6.4.15) 


(iv) For any € > 0 there exists ab = b(€) < co such that 


G"* (t) 


Gi) <b(il+e) 


foralln>2andt. 


6.4 Subexponential Distributions and Their Properties 681 


In addition to assertions (i) and (ii) of the theorem, we can also show that if G € S 
and the function m(t) € £ possesses the property 


0<m, <m(t)<m2<c 


then G;(t) = m(t)G(t) € 8. 

Theorems A6.4.1(vi), A6.4.2 and A6.4.3(iii) imply the following simple state- 
ment elucidating the subexponentiality condition for random variables taking values 
of both signs. 


Corollary A6.4.1 A distribution G belongs to 8 if and only if G € £ and G**(t) ~ 
2G(t) as t > oo. 


Remark A6.4.5 Evidently the asymptotic relation Gj(t) ~ G2(t) as tf > oo is an 
equivalence relation on the set of distributions on R. Theorem A6.4.3(ii) means 
that the class S is closed with respect to that equivalence. One can easily see that in 
each of the equivalence subclasses of the class § with respect to this relation there 
is always a distribution with an arbitrarily smooth tail G(t). 


Indeed, let p(t) be an infinitely differentiable probability density on R vanishing 
outside [0, 1] (we can take, e.g., p(x) =c- e7!/@C-) if x € (0, 1) and p(x) =0 
if x ¢ (0, 1)). Now we “smooth” the function /(t) := —InG(t), G € §, putting 


lo(t) = f pe - wtp au, and let Go(t) := e7 0, (A6.4.16) 


Clearly, Go(t) is an infinitely differentiable function and, since /(t) is nondecreasing 
and we actually integrate over [t — 1, f] only, one has /(t — 1) < Jo(t) < /(t) and 
hence by Theorem A6.4.2 


ie Go(t) E G(t — 1) 
~ Git) ~ Git) 


>1 ast>o. 


Thus, the distribution Gg is equivalent to the original G. A simpler smoothing pro- 
cedure leading to a less smooth asymptotically equivalent tail consists of replacing 
the function /(t) with its linear interpolation with nodes at points (k,/(k)), k being 
an integer. 

Therefore, up to a summand o(1), we can always assume the function /(t) = 
—InG(t), Gé&S, to be arbitrarily smooth. 

The aforesaid is clearly applicable to the class £ as well: it is also closed with 
respect to the introduced equivalence, and each of its equivalence subclass contains 
arbitrarily smooth representatives. 


Remark A6.4.6 Theorem A6.4.3(ii) and (iii) immediately implies that if G € S then 
also G™* ¢ 8,n = 2,3,.... Moreover, if we denote by G"” the distribution of the 


682 6 Regularly Varying Functions 


maximum of independent identically distributed random variables ¢),...,¢, €G, 
then the evident relation 


G"Y (t)=1-(1-G@)"~nG@) ast oo (A6.4.17) 


and Theorem A6.4.3(ii) imply that G"Y also belongs to 8. 

Relations (A6.4.17) and (A6.4.15) show that, in the case of a subexponential 
G, the tail G”*(t) of the distribution of the sum of a fixed number n of indepen- 
dent identically distributed random variables ¢; & G is asymptotically equivalent 
(as t > 00) to the tail G”Y (t) of the maximum of these random variables, i.e., the 
“large” values of this sum are mainly due to by the presence of one “large” term ¢; 
in the sum. It is easy to see that this property is characteristic of subexponentiality. 


Remark A6.4.7 Note also that an assertion converse to what was stated at the be- 
ginning of Remark A6.4.6 is also valid: if G’* € 8 for some n > 2 then Ge 8 
as well. That G"Y € S implies G € § evidently follows from (A6.4.17) and Theo- 
rem A6.4.3(ii). 


Proof of Theorem A6.4.3 (i) First assume that cjc2 > 0 and that both distributions 
Gj; are concentrated on [0, 00). Fix an arbitrary « > 0 and choose M large enough 
to have G;(M) <«,i= 1,2, and G(M) < «, and such that, fort > M, 

G;(t) G(t — M) 


wn ee i=1,2, eg 


(1 —e)cj < 
(A6.4.18) 


(the last inequality holds by virtue of Theorem A6.4.2). 
Let ¢ & G and ¢; © G;, i = 1,2, be independent random variables. Then, for 
t > 2M, we have the representation 


G1 * Go(t) = Pj} + Po + P3t Pa, (A6.4.19) 
where 
Pi :=P(o1 >t — bo €[0,M)), 
Pp = P(g =t—%1, 61 €[0, M)), 


P3:=P(Q>t-—%, t1 €[M,t—M)), 


(see Fig. A.1). 

We show that the first two terms on the right-hand side of (A6.4.19) are asymp- 
totically equivalent to c;G(t) and c2G(t), respectively, while the last two terms are 
negligibly small compared with G(t). Indeed, for P} we have the obvious two-sided 
bounds 


6.4 Subexponential Distributions and Their Properties 683 


Fig. A.1 Illustration to the () 
proof of Theorem A6.4.3, 
showing the regions P;, Po 
i=1,2,3,4 t 
t—M P3 
Ps 
M 
P, 
0 M t-M ¢ ST 


(1 —¢)?c1G(t) < Gi (t)(1 — G2(M)) = P(é1 >t, & € [0, M)) 
<P) <P =t-M)=Gi¢—-M) < (1 +eyeiG(r) 
by (A6.4.18); the term Pz can be bounded in a similar way. Further, 
Py =P(Q2 > M, G1 >t — M) = Go(M)G(t — M) < e(1+8)°2G(). 


It remains to estimate P3 (note that it is here that we will need the condition G € 8; 
so far we have only used the fact that G € £). We have 


p= | Galt =») Gildy) < 1+ 802 [ 
[M,t—M) 


[M,t—M 


Oe aN 


(A6.4.20) 
where it is clear that, by (A6.4.18), the last integral is equal to 


P(¢+01>1, 01 €[M,t—M)) 
=P(¢>1—M,% €[M,t—M))+P(¢+%>1,¢ €[M,t— M)) 


=Ga-M)Gi((M.r— mM) + f 


[Mt— 


G\(t — y) G(dy) 
M) 


<e1t+e)G+d+ oer f 


[M.t—M 


G(t — y)G(dy). (A6.4.21) 
) 


Now note that similarly to the above argument we can easily obtain (setting 
G, = G2 =G) that 


G**(t) = (1+ 612)2G(t) + i, G(t — y)G(dy) + e(1 + 2€)G(), 
[M,t—M) 


where |6;| < 1, i= 1,2. Since G**(t) ~ 2G(t) by virtue of G € $4, this equality 
means that the integral on the right-hand side is o(G(t)). Now (A6.4.21) immedi- 
ately implies that also P3 = o(G(t)), and hence the required assertion is established 
for the case G € 8. 


684 6 Regularly Varying Functions 


To extend the desired result to the case of distributions G; on R, it suffices to 
repeat the argument from the proof of Theorem A6.4. 1 (vi). 

The case when one of the c; can be zero can be reduced to the case cjc2 > 0, 
which has already been considered. If, say, c; = 0 and c2 > 0, then we can introduce 
the distribution G; := (G; + G)/2, for which clearly Gi (t)/G(t) > ¢, = 1/2, and 
hence by the already proved assertion, as t > ov, 


1 Git) _ Git Go) + G* GO 
eee Ga 2G 
_ G, * Go(t) ane 
acer + (1 + 0(1)) —= 


so that Gj * G2(t)/G(t) > co =c1 +2. 

(ii) Denote by G3 the distribution of the random variable Ga , where (9 € Go. 
Since Gj (t) = Go(t) for t > 0, it follows immediately from (i) with G| = G2 = GF 
that (Gj )*(t) ~ 2GG (t), Le. Go € 8. 

(iii) If G € § then by Theorems A6.4.1(vi) and A6.4.2 we have, as tf > ov, 


Gt) ~ (Gt)*() ~ 2G). 


Now relation (A6.4.15) follows immediately from (i) by induction. 
(iv) Similarly to (A6.4.11), we have G”* (t) < G"* (t),n > 1. Therefore it is clear 
that it suffices to consider the case G € 5;. Put 


Gh 
Ay, := SU 
=, oO. 


Similarly to (A6.4.14), for n > 2, we have 


t 
G*(t) = Gt) + / G-* 4 — Gy), 
0 


and hence, for each M > 0, 


t Gia) t— 
OQ, <1+ sup / oe Yeas 
O0<r<M JO G(t) 


'GO-D¥G— yy Gay) 
G(d 
+ sup f Ge-n ow 7%” 


1 G2*(t) — G(t) 
<1+ — _1Ss Se 
=" Gi 2 - Ge 


Since G € 8, for any ¢ > 0 there exists an M = M(e) such that 


G2*(t) — G(t) 


su <l+e 
ou GO 


6.4 Subexponential Distributions and Their Properties 685 
and hence 
On<bot+tai1UA+e), bdbo:=14+1/GM), a=. 


This recurrently implies 


n—-1 
. bd 
On < bo + bo(1 +e) tom. +e) <---<bo ) (+8)! < =O +6)", 
j=0 


The theorem is proved. 


Appendix 7 
The Proofs of Theorems on Convergence 
to Stable Laws 


In this appendix we will prove Theorems 8.8.1—8.8.4. 


7.1 The Integral Limit Theorem 


In this section we will prove Theorem 8.8.1 on convergence of the distributions of 
normalised sums S$, = el & to stable laws. Recall the basic notation: 


Fx):=PE=1, F():=PE<-), 
Fo(t) := Fy (t) + F-() =P(E ¢[-1,0)). 
The main condition used in the theorem has this form: 


[Rg,o] The total tail Fo(x) = F_(x) + Fy (x) is arvf as x > ov, i.e. can be 
represented as 


Fo(x) = t PL p(x), B € (0, 2], (A7.1.1) 

where Ly(x) is an s.v.f., and there exists the limit 

— a, P+) ; 
a jim, Fo(a) € [0, 1], p:=2p4-1. (A7.1.2) 
In the case B < 2 we put 
b(n) := FS? 1/n), (A7.1.3) 
while for 6 = 2 we set 

b(n) := YY (1/n), (A7.1.4) 


where 


t 
Y(t) mar? | yFo(y) dy = 7E(&*; -1 <& <t)=r7Ly(t), — (A7.1.5) 
0 


A.A. Borovkov, Probability Theory, Universitext, 687 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


688 7 The Proofs of Theorems on Convergence to Stable Laws 


Ly(t) is ans.v.f., so that (see Theorem A6.2.1(v) of Appendix 6) 
b(n) =n'/*Lp(n), Lp is ans.vf. 


In the case when F(t) and F_(t) are regularly varying functions (for instance, 
when condition [Rg,p] is satisfied and p = 0), we will denote these functions 
by V(t) and W(t), respectively, and put 


t foe) 
vers [ Vo)dy, Ws i V(y) dy: 
t 


the same notational convention will be used for W. 

If Fi (t) = o(Fo(t)) as t > 00 (op = —1), then F(f) is not necessarily a regu- 
larly varying function, but everything we say below regarding the sums V(t) + W(t) 
and V/(t)+ W! (t) remains valid if we understand by their first summands quantities 
negligibly small compared to the second summands (the first summands can also be 
replaced by zeros). This is also true for the sums V7 (t) + W(t), except for the case 
when Emax(0, &) exists and V;(t) has to be replaced by E(é; € > 0) + o(1). 


Theorem A7.1.1 Let condition [Rg_,| be satisfied and ¢, := Bey 
(i) For B € (0,2), B £1, and scaling factor (A7.1.3), asn > o, 


Race, (A7.1.6) 


where the distribution F g , of the random variable ¢ (B-P) depends on the parameters 
B and p only and has chf. 


gPP)(t) = Bel” = expfit| B(B, p, 9)}, (A7.1.7) 
where 3 := signt, 
B(B, 0,0) = Pra— P| v0 sin a — cos 4 (A7.1.8) 


and, for B € (1, 2), we assume that "(1 — B) =T'(2— B)/UA — B). 

(ii) When B = 1, for the sequence €, with scaling factor (A7.1.3) to converge to 
a limiting law the former, generally speaking, needs to be centred. More precisely, 
we have, asn > oo, 


fn — An => 699), (A7.1.9) 
where 


n 
A, = —— 


~ b(n) 
C & 0.5772 is the Euler constant, and 


[Vi (b(~)) — Wi(b(n))] — eC, (A7.1.10) 


. t 
yl) (t) _ Eeité® py exp| -7 —iptln ii}. (A7.1.11) 


7.1 The Integral Limit Theorem 689 


Tf n{V7(b(n)) — W,(b(n))] = o(b(n)), then p = 0 and we can put A, = 0. 
If there exists EE = 0 then 


n 


A= om) [W!(b()) — V'(b(n))] — pC. 


IfEE =0 and p £0 then pAy > —wasn—> ~. 
(iii) For 6 = 2 and scaling factor (A7.1.4), asn > 00, 


tc), og) (p) = Kell = oP /2, 
so that ¢-) has the standard normal distribution which does not depend on p. 


Proof We will use the same approach as in the proof of the central limit theorem 
using relation (8.8.1). We will study the asymptotic properties of the ch.f. g(t) = 
Ee!“ in the vicinity of zero (more precisely, the asymptotics of 


LL 
(sis) -1>0 


as b(n) — oo) and show that, under condition [Rg,,], for each yz € IR, we have 


no) = 1) + Ing®?)(w) as n> 00 (A7.1.12) 
b(n) 


(or some modification of this relation, see (A7.1.48)). This will imply that, for ¢, = 
S(n)/b(n), as n — oo, there holds the relation (cf. Lemma 8.3.2) 


Gr, (u) > pF” (yu). (A7.1.13) 


Indeed, 


| ee 
Ge, (“U) = (5). 


Since g(t) > 1 as t > 0, one has 


Ing, (u) =n ino) 


crf (ls) ls) 


where |Rn| < n|y(j2/b(n)) — 1|* for all n large enough, and hence R,, — 0 by virtue 
of (A7.1.12). It follows that (A7.1.12) implies (A7.1.13). 

So first we will study the asymptotics of g(t) as t — O and then estab- 
lish (A7.1.12). 


690 7 The Proofs of Theorems on Convergence to Stable Laws 
(i) First let B € (0, 1). We have 
oe) ; oe} . 
v= f ayia) - f e''* dW(x). (A7.1.14) 
0 0 
Consider the former integral 
-| e'* dV (x) = votirf eV (x) ax, (A7.1.15) 
0 0 
where the substitution |t|x = y, |t| = 1/m yields 


CO foe) 
I(t) it | évixydx id [ e'”Y V(my) dy, (A7.1.16) 
0 0 


0 = signt (we will henceforth exclude the trivial case t = 0). 
Assume for the present that po, > 0. Then V(x) is an r.v.f. as x — oo and, for 
each y, by virtue of the properties of s.v.f.s we have, as |m|— 0, 


Vimy) ~ y ?V(m). 
Therefore it is natural to expect that, as |t| > 0, 
oo . 
L(t) ~ i Vim) | e!¥ y-B dy =i0 V(m)A(B, B), (A7.1.17) 
0 
where 
oo . 
A(B, 3) = i: el?VyB dy. (AT7.1,18) 
0 
Assume that relation (A7.1.17) holds and similarly (in the case when p_ > 0) 
oo . 
-{ e '* dW(x) = W(0) + L(t), (A7.1.19) 
0 
where 
Co . (oe) . 
L(t):= -it f e W(x) dx ~ -iowom | e YB dy 
0 0 
= —i0W(m)A(B, —v). (A7.1.20) 
Since V(0) + W(O) = 1, relations (A7.1.14)-(A7.1.20) mean that, as t > 0, 


p(t) =1+ Fo(m)id [1 A(B, &) — p_A(B, —B8)](1 + 0(1)). (A7.1.21) 


We can find an explicit form of the integral A(6, 3). Observe that the integral 
along the boundary of the positive quadrant (closed as a contour) in the complex 


7.1 The Integral Limit Theorem 691 


plane of the function e!%z~*, which, as |t| > 0, is equal to zero. From this it is not 
hard to obtain that 


A(B, 0) =F — Bye? "O72, Bs 0. (A7.1.22) 


(Note also that (A7.1.18) is a table integral and its value can be found in handbooks, 
see, e.g., integrals 3.761.4 and 3.761.9 in [18].) 
Thus, in (A7.1.21) one has 


C — p)x 
2 


CU — £)x Cd — p)x | 
2 


i9 [04 A(B, 9) — p_A(B,-0)] =i9 T= p|os cos 


+ idp+ sin 5 p— Cos +idp_ sin ae a 


=Fra—- P| i900. — p_—) cos c en joo pe | 


2 
br 


Te 
=Trd- pl ioe ae — cos | = B(p, p,), 


where B(f, o, %) is defined in (A7.1.8). Hence, as t > 0, 
y(t) —1= Fo(m)B(£, p, vy(1 + o(1)). (A7.1.23) 


Putting t = w/b(n) (so that m = b(n)/||), where b(7) is defined in (A7.1.3), and 
taking into account that Fo(b(n)) ~ 1/n, we obtain 


nl o( ee) 1 - nh“) oe p.0)(1+0(1)) ~ MP BEB, p, 8). 
(n) IL 

(A7.1.24) 
We have established the validity of (A7.1.12) and therefore that of assertion (i) of 
the theorem in the case B < 1, p+ > 0. 

If o4 =0 (o_ = 0) then, as was already mentioned, the above argument remains 
valid if we replace V(m) (W(m)) by zero. This follows from the fact that in this 
case F’, (t) (F_(t)) admits a regularly varying majorant V*(t) = o(W(t)) (W*(t) = 
o(V(t))). 

It remains only to justify the asymptotic equivalence in (A7.1.17). To do that, it 
is sufficient to verify that the integrals 


€ [o.@) 
/ e!”Y V(my) dy, / e'” V(my) dy (A7.1.25) 
0 M 


can be made arbitrarily small compared to V (m) by choosing appropriate ¢ and M. 
Note first that by Theorem A6.2.1(iii) of Appendix 6 (see (A6.1.2) in Appendix 6), 
for any 6 > 0, there exists an xs > 0 such that, for all v < 1 and vx > x3, we have 


V(vx) _p-5 
Va) <(1+6)v . 


692 7 The Proofs of Theorems on Convergence to Stable Laws 


Therefore, for 6 < | — 6 and x > x5, 


x x 1 V(vx) 
/ vandu sis f V(u)du=x3 + xV(x) dv 
0 xg x5 /x V(x) 


1 
<x txvani+s) | v P-dy 
0 


= a scx Ven) (A7.1.26) 


since x V(x) > oo as x > o&. It follows that 


& : 1 em 
i e'’Y V(my) dy| < —f V(u)du <ceV(em) ~ ce! 8 Vm). 
0 Mm JO 


Since ¢!~-8 -+ 0 as ¢ > O, the first assertion in (A7.1.25) is proved. The second 
integral in (A7.1.25) is equal to 


co. ia ~ 1 Oe as 
i e'’YV(my) dy = en viny) = = e'” dV (my) 
M id M iv Ju 


1 iM 1 . iv 
= —— ¢e!°My(mM) — = e4l™aV (x), 
iv iv mM 


so its absolute value does not exceed 
2V(mM) ~ 2M~’ v(m) (A7.1.27) 


as m — oo. Hence the value of the second integral in (A7.1.25) can also be made ar- 
bitrarily small compared to V (m) by choosing an appropriate M. Relation (A7.1.17) 
together with the assertion of the theorem in the case 6 < | are proved. 

Let now f é¢€ (1, 2) and hence there exist a finite expectation E& which, according 
to our condition, will be assumed to be equal to zero. In this case, 


It] 
g(t)-1l= of g'(vu)du, v0 =signt, (A7.1.28) 
0 
and we have to find the asymptotic behaviour of 
ane . i 1 1 
g'(t) = -i f xe dV (x) + if xe dW (x) =: 19) + 19 (1) (AT.1.29) 
0 0 
as t > 0. Since x dV (x) = d(xV(x)) — V(x) dx, integration by parts yields 
1M?) := -i f xe dV (x) = -i f e!*d(xV(x)) + if eV (x) dx 
0 0 0 


CO F lee) ; 
= -+ f xV(x)ei dx iv" (o)—r f Vi (x)el!* dx 
0 0 


7.1 The Integral Limit Theorem 693 
o° ~ . 
= iv! (0) _ | Vixye'™ dx, (A7.1.30) 
0 


where, by Theorem A6.2.1(iv) of Appendix 6, both functions 


V! (x) = | vwau~ rae asx —>oo, V/(0)<oo, 
and 
V(x) = xV(x) + V(x) ~ _ 


are regularly varying. 
Letting, as before, m = 1/|t|, m — 00 (cf. (A7.1.16), (A7.1.17)), we get 


oO ; _ oo 
-+ f Vive dx = -9 Fem f V(my)el”” dy 
0 0 


a9) Pega apie, 
[> ee = Ta (B ) 


Fi 
1 t) =iv! 0) ee oe AB 1, 9)(1+40(1)), (A7.1.31) 
where the function A(8, #) defined in (A7.1.18) is equal to (A7.1.22). 
Similarly, 


ee) . 
Dt) := if te!™ dW(x) 
0 


= -+ f xW(xje* dx —iw'(o)—t f W! (x)e7!™ dx 
0 0 


lo) 
=jw! (0) = rf Wxje i? dx, 
0 


where 
W(x): i Wiw)du, W(x) = xWx) + W(X) ~ aus) 
and 
- f° W(x) e7 dx ~ FAB 1,—9). 
Therefore 
1) = iw! 0) — FPF ag — 1, -9)(1 + of) 


t(B— 1) 


694 7 The Proofs of Theorems on Convergence to Stable Laws 


and hence, by virtue of (A7.1.29), (A7.1.31), and the equality vi (0) — w! (0) = 
Eé = 0, we have 


_ BFo(m) 
t(B — 1) 


We return now to relation (A7.1.28). Since 


g'(t)= [p+A(B — 1, 0) + p_A(B — 1, -9)](1 + 0(1). 
It| 

[oct ret) au~ p(t) = Bt Fo) 

0 


(see Theorem A6.2.1(iii) of Appendix 6), we obtain, again using (A7.1.22) and an 
argument similar to the one in the proof for the case 6 < 1, that 


i 
ot) —1=—3— Fom) [p+ A(B — 1, 0) + p_A(B — 1, -8)](1 + 0(1)) 
= et Fotn| os (cos cuir +i0 sin a 
+p. (cos ie pe ey * \lo +0(1)) 
= = Foto] 00s me —ivpsin 4 (1 + o(1)) 
= Fo(m)B(B, p, 8)(1+0()). (A7.1.32) 


We atrive once again at relation (A7.1.23) which, by virtue of (A7.1.24), implies the 
assertion of the theorem for f € (1, 2). 

(ii) Case 6B = 1. In this case, the computation is somewhat more complicated. We 
again follow relations (A7.1.14)-(A7.1.16), according to which 


g(t) =14+h(0)+/_(0). (A7.1.33) 
Rewrite expression (A7.1.16) for [4 (t) as 
[o) F Co (oe) 
Ty (x)= io f e!’Y V(my) dy = io f V(my) cos ydy — , V(my) sin y dy, 
0 0 0 
(A7.1.34) 


where the first integral on the right-hand side can be represented as the sum of two 
integrals: 


1 love) 
[ veway+ f g(y)V(my) dy, (A7.1.35) 


cosy—l1 if y<l, 
co=| y = (A7.1.36) 


cos y if y>1. 


7.1 The Integral Limit Theorem 695 


Note that (see, e.g., integral 3.782 in [18]) the value of the integral 


[o) 
-f g(y)y_ | dy =C 0.5772 (A7.1.37) 
0 


is the Euler constant. Since V(ym)/V(m) > ae as m — Oo, similarly to the above 


argument we obtain for the second integral in (A7.1.35) the relation 


/ g(y)V(my) dy ~ —CV(m). (A7.1.38) 


Consider now the first integral in (A7.1.35): 


1 m 
/ Vimy)dy =m" f V(u)du=m7!V;(m), (A7.1.39) 
0 0 
where 
t 
V7 (x) := / V(u) du (A7.1.40) 
0 
can easily be seen to be an s.v.f. in the case 8 = 1 (see Theorem A6.2.1(iv) of 
Appendix 6). Here if E|&| = oo then V;(x) > co as x > on, and if E|é| < oo then 
Vi (x) > Vj (co) < co. 
Thus, for the first term on the right-hand side of (A7.1.34) we have 
Im 1,(t) = 0(—CV(m) + m!V;(m)) + 0(V(m)). (A7.1.41) 


Now we will determine how V; (vx) depends on v as x — oo. For any fixed v > 0, 


V(yx) 
Vy’ 


Vi(ux) = Vite) + | Vi)du= Vic) +xVv09 [ 
x 1 


By Theorem A6.2.1 of Appendix 6, 
vV od 
i oO ay~ f 22 Sinn 
1 VQ) 1 y 


Vi (vx) = Vi (x) + (1 + o(1))xV (x) Inv=: Ay(v,x)+xV(x)Inv,  (A7.1.42) 


so that 


where evidently 
Ay(v,x)= V(x) + 0(xV(x)) as x > CO (A7.1.43) 


and V;(x) >> x V(x) by Theorem A6.2.1(iv) of Appendix 6. 


696 7 The Proofs of Theorems on Convergence to Stable Laws 


Therefore, for t = /b(n) (so that m = b(n)/|| and hence V(m) ~ p+||/n), 
we obtain from (A7.1.41) and (A7.1.42) (where one has to put x = b(n), v = 1/||) 
that the following representation is valid as n — oo: 


Im 1,(t) = -con ‘i fe av(ur' b(n)) — eH nl +o(n-!) 
eee oa 
~ b(n) 


Ay (|u| ', b(n)) — mc +In|y|)+0(n7').  (A7.1.44) 


For the second term on the right-hand side of (A7.1.34) we have 
lo) Co 
Re 1, (t) = -{ V(my) sin y dy ~ -vom [ y!sinydy. 
0 0 


Because sin y ~ y as y > 0, the last integral converges. Since (y) ~ 1/y as y > 
0, the value of this integral can be found to be (see (A7.1.22) and (A7.1.22)) 


lim P(y)sin— == (A7.1.45) 
im sin ——- = —. Pa Or 
ao ee 
Thus, for t = /b(n), 
Re 1,(t) = — el +o(n7'). (A7.1.46) 
nN 


In a similar way we can find an asymptotic representation for the integral J_(t) 
(see (A7.1.14)-(A7.1.20)): 


CO 
L(t := -i0 | Wmy)e!”9 dy 
0 


Cc Co 
= -i0 | Wony) cos ydy — | W(my)siny dy. 
0 0 


Comparing this with (A7.1.34) and the subsequent computation of [+ (t), we can 
immediately conclude that, for t = w/b(n) (cf. (A7.1.44), (A7.1.46)), 


—pAw(|ul~!,b(n)) | p-u 


7 =} 
Im I_(t) = b(n) a n ceraalal ee ) (A7.1.47) 
Re/_(t)= aoe: +o(n7'). 

n 


Thus we obtain from (A7.1.33), (A7.1.44) and (A7.1.46) that (A7.1.47) imply 


Lb — ale ipy 
(ay) eet Te 


+ bap av (att b(n) — Aw(Iul-!, b@))] + 0(n"). 


7.1 The Integral Limit Theorem 697 


It follows from (A7.1.43) that the penultimate term here is equal to 


=" [vi(bqn)) - W7(b(n))] +0(n—'), 


re ) 
so that finally, 
id ||  ipp i : 
(55) 1=- > — ——Inlul +in— +o(n *) (A7.1.48) 
where 
An= a Bal) - W(b(n))] — 


Therefore, similarly to (A7.1.12) and (A7.1.13), we obtain 


Gey —An (Ht) = cindrge( A) = exp| ind, + na + (o(4) - 1)]| 
2 
of. wy) ee} 
=e ian +n(o( es) 1) +n0(lo(5t;) |) 


As, for 6 = 1, by Theorem A6.2.1(iv) of Appendix 6, the functions V; and W, are 
slowly varying, by (A7.1.48) one has 


m : 1 A? 1 2 
nlo( oo) 1 ae (7+) a(>+ ve rap lv (om)? + W1(b(n)) |) +o, 


Since clearly 
; be mel. 
“inn +n(o( i) > 7 iow in|, 


I | {| 
2 


we have 
Pas-as(t) > exp = = iowintull, 


so relation (A7.1.9) is proved. The subsequent assertions regarding the centring se- 
quence {A,} are evident. 


(iii) It remains to consider the case 6 = 2. We will follow representations 
(A7.1.28)-(A7.1.30), according to which we have to find, as m = 1/|t| — ov, the 
asymptotics of 


gH) =I M+, (A7.1.49) 
where 


[o.@) [o@) 
Wo =ivio =r f Peet ax=iv'o)—o f Vimy)e!”” dy 
0 0 
(A7.1.50) 


698 7 The Proofs of Theorems on Convergence to Stable Laws 


and, by Theorem A6.2.1(iv) of Appendix 6, 
oo ~ 
Vi (x) = VOo)dy~xV(x), Viw~)=xV(x)+ V1 (x) ~ 2xV(x) (A7.1.51) 
x 
as x — oo. Further, 
CO oy, ; [o,@) oa CO a 
/ Vimy) dy = f V(my) cos ydy + of V(my)sinydy. (A7.1.52) 
0 0 0 


Here the second integral on the right-hand side is asymptotically equivalent, as 
m — oo, to (see (A7.1.45)) 


oS oo at ee 
Fem f y sin y dy = = V(m). 
0 2 
The first integral on the right-hand side of (A7.1.52) is equal to 
I lee) ce 
i Vimy) dy +f g(y)V(my) dy, 
0 0 
where the function g(y) was defined in (A7.1.35), and 
1 a 1 me lw 
[ Fomay== fo Vedu= = Fim, 
0 m Jo m 


V1 (x) := Jo V (u) du being an s.v.f. by (A7.1.51). Since 


x 2 x 
/ uV(u)du = eV al u-dV(u), 
0 2 2 Jo 


[ Vi @)du=xViG)+ [va 
0 0 


and V! (x) ~ xV(x), we have 
x 
Vi(x) = / (wV (wu) + V'(u)) du 
0 
x 
=xV! (x) +x°V(x) -| u> dV (u) 
0 
x 
= 2 2 
--| u dV(y) + O(x V(x)), (A7.1.53) 
0 
where the last term is negligibly small, because 


[ uV(u)du> x°V(x) 
0 


(see Theorem A6.2.1(iv) of Appendix 6). 


7.2 The Integro-Local and Local Limit Theorems 699 
It is also clear that, as x — ov, 
V V 2 
Vi (x) > Vi (co) = E(é ;éE> 0) € (0, ov]. 


As a result, we obtain (see also (A7.1.38)) 


Ops —ayl it 7 s . 
LIV (0) — = Vim) — tVi(m) + 0CV (mn) + o(V(m)) 
=iV'() —tV;(m)(1+0(1)) 


since V;(x) > tV (x). 
Quite similarly we get 


1) = iW! (0) — tW)(m)(1 + 0()), 


where Wr is an s.v.f. which is obtained from the function W in the same way as V 
from V. Since V/ (0) = W/ (0), relation (A7.1.49) now yields that 


g(t) = —t[ Vim) + Wim) ](1 + 0(1)). 


Hence from (A7.1.28) we obtain the representation 


1/m 1/m ~ = 
oy-1=o | eowdu=— f u[Vi(1/u) + Wr(1/u)| du 
1 ow ~ 1 
~ =x 5LVilm) + Wi(m)] ~ —GE(E*; —m = & <m) 


by virtue of (A7.1.53) and a similar relation for Wr. Turning now to the definition 
of the function Y (x) = x? Ly (x) in (A7.1.5) and putting 


b(n) :=YCV(/n), t=p/b(n), 


we get 


2 


2 
~-"5 y(bin)) > 7 


n(p() — 1) ~ 5 ¥(b@)/In/) 5 


The theorem is proved. 


7.2 The Integro-Local and Local Limit Theorems 


In this section we will prove Theorems 8.8.2—8.8.4. We will begin with the integro- 
local theorem. 


700 7 The Proofs of Theorems on Convergence to Stable Laws 


Theorem A7.2.1 (Integro-local Stone’s theorem) Let & be a non-lattice random 
variable and the conditions of Theorem A7.1.1 be satisfied. Then, for each fixed 
A> 0, 


_ Aa 6G. (=) (5) 
P(S, € A[x)) = Oy fer 5 +o xO) asn— oo, 


where the remainder term o( Gam) is uniform in x. 


Proof of Theorem A7.2.1 The Proof is analogous to the proof of Theorem 8.7.1. We 
will again use the smoothing approach and consider, along with the sums S,,, the 
sums 


Zn = Sn + On, 


where @ = const and n is chosen so that its ch-f. is equal to 0 outside a fi- 
nite interval. For instance, we can choose 7 as in Sect. 8.7.3, i1.e., with the ch.f. 
y(t) = max(0, 1 — |t|). Then equality (8.7.19) will still be valid with the same de- 
composition of the integral on its right-hand side into the subintegral 7; over the 
domain |f| < y and J» over the domain y < |t| < 1/0. Here estimating J) can be 
done in the same way as in Theorem 8.7.1. 

For the sake of brevity, put Q(t) := Gna (t)@on(t). Then, for the integral J; with 
x = vb(n), we have 


haf ee mpwnar= a (Ga co) 7 
_ \t|<y e 4 ~ b(n) |ul|<yb(n) . b(n) e b(n) ; 
(A7.2.1) 


As was shown in the proof of Theorem 8.1.1, for each u we have 


(i) > oP (u) asn— oo, 
n 


and, moreover, for some c > 0 and y > 0 small enough, by, virtue of, say, (A7.1.23) 
and (A7.1.32), we have 


Re(g(t) — 1) <-c ri(=). 


It 


and, for any ¢ > 0 and all n large enough, 


ne(o( 1) < nfy(“) « c|ulF-*. 


Here we used the properties of the r.v.f. Fo. Moreover, 


7.2 The Integro-Local and Local Limit Theorems 701 
The above also implies that, for all wu such that |u| < yb(n), 


n Uu 
(os) 


The obtained relations mean that we can use the dominated convergence theorem 
in (A7.2.1) which implies 


<e ew 


(A7.2.2) 


lim b(n) = [mew au (A7.2.3) 
noo 


uniformly in v, since the right-hand side of (A7.2.1) is uniformly continuous in v. 
On the right-hand side of (A7.2.3) is the result of the application of the inversion 
formula (up to the factor 1/27) to the ch.f. g). This means that 


lim b(n) = 22 f?)(v). 
noo 


We have established that, for x = vb(n), asn > oc, 


_~ A po) { _*_ =— 
hE) al al Ga) +s) 


uniformly in v (and hence in x). 
To prove the theorem it remains to use Lemma 8.7.1. 
The theorem is proved. 


The proofs of the local Theorems 8.8.3 and 8.8.4 can be obtained by an obvious 
similar modification of the proofs of Theorems 8.7.2 and 8.7.3 under the conditions 
of Theorem 8.8.1. 


Appendix 8 

Upper and Lower Bounds for the Distributions 
of the Sums and the Maxima of the Sums 

of Independent Random Variables 


Let &, &, &,... be independent identically distributed random variables, 


1<k<n 


n 
Se Sy = max Sr. 
i=l 


The main goal of this appendix is to obtain upper and lower bounds for the proba- 
bilities P(S,, > x) and P(S, > x). These bounds were used in Sect. 9.5 to find the 
asymptotics of the probabilities of large deviations for S, and Sy. 


8.1 Upper Bounds Under the Cramér Condition 


In this section we will assume that the following one-sided Cramér condition is met: 


[C] There exists aX > 0 such that 
W(A) = Ee** < 00. (A8.1.1) 
The following analogue of the exponential Chebyshev inequality holds true for 
P(S;, = x). 
Theorem A8.1.1 For all n > 1, x >0 andi >= 0, we have 
P(S, > x) <e7* max(1, p"(A)). (A8.1.2) 
Proof As n(x) := inf{k > 1: S, => x} <0o is a Markov time, the event {7n(x) =k} 
is independent of the random variables S$, — S;. Therefore 
n n 
wy" (a) = Ee*Sn > YE (e*5"; n(x) =k) = YE (eh FSS): n(x) = k) 
k=1 k=1 


A.A. Borovkov, Probability Theory, Universitext, 703 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


704 8 Bounds for Sums and Maximum of Sums 


= ST" S)P(n(x) =k) =e min(1, y"(A)) Pn >»). 
k=1 


This immediately implies (A8.1.2). The theorem is proved. 


If w() = 1 for A => 0 (this is always the case if there exists EE > 0) then the 
right-hand side of (A8.1.2) is equal to e~** yr" (A), and the equality (A8.1.2) itself 
can also be obtained as a consequence of the well-known Kolmogorov—Doob in- 
equality for submartingales (see Theorem 15.3.4, where one has to put X;, := S,). 

Thus, if Eé > 0 then 

PSx =x ae free, 


Choosing the best possible value of 1. we obtain the following inequality. 


Corollary A8.1.1 If EE > 0 then, for alln => 1 and x = 0, we have 
PSieayse™, 
where 


x 
ais, A(@) := sup(Aa — Iny(A)). 
n a 


The function A(q) is the rate function introduced in Sect. 9.1. Its basic proper- 
ties were stated in that section. In particular, for E& = 0 and Eé 2 —o0* <0, the 


asymptotic equivalence A(a) ~ 2 as a — 0 takes place, which yields that, for 
x =o(n), 
= x2 
P(S, > x) sexp|- (+00). (A8.1.3) 
2no 


8.2 Upper Bounds when the Cramér Condition Is Not Met 


In this section we will assume that 
Eé = 0, E&? = 07 <oo. (A8.2.1) 


For simplicity’s sake, without losing generality, in what follows we will put o = 1. 
The bounds will be obtained for the deviation zone x > ./n which is adjacent to the 
zone of “normal deviations” where 


P(S, >x)~1-@ (= (A8.2.2) 


Fi) 


8.2 Upper Bounds when the Cramér Condition Is Not Met 705 


(uniformly in x € (0, Nj,./n), where N, — oo slowly enough as n — oo; see 
Sect. 8.2). Moreover, it was established in Sect. 19.1 that, in the normal deviations 


zone, 
= x 
P(S, >) ~2(1 _ o(=)). (A8.2.3) 


To derive upper bounds in the zone x > ./n when the Cramér condition [C] 
is not met, we will need additional conditions on the behaviour of the right tail 
F,(t) =P = £) of the distribution F. 

Namely, we will assume that the following condition is satisfied. 


[<] For the right tail F+(t) = P(é = t) there exists a regularly varying (see 
Appendix 6) majorant V(t): 


Fi(t)<Vi(t):=t L(t) forallt >0, 
where B > 2 and L is a slowly varying function (s.v.f., see Appendix 6). 


By virtue of (A8.2.2) and (A8.2.3), for deviations x < N,./n, n > ov, it would 
be natural to expect upper bounds with an exponential right-hand side e-*/Qn) 
(cf. (A8.1.3)). On the other hand, Theorem A6.4.3(iii) of Appendix 6 implies that, 
for F(t) = V(t) € & and any fixed n we have, as x > oo, 


P(S;, > x) ~nV(x). (A8.2.4) 


This relation clearly holds true if n — oo slowly enough (as x — oo). 
The asymptotics (A8.2.2) and (A8.2.4) merge with each other remarkably as 
follows: 


x 
P(S;, > x) ~ (1 = »(=.)) +nV(x) (A8.2.5) 
as n — oo for all x > ./n (for more details see, e.g., [8] and the bibliography 
therein). Relation (A8.2.5) allows us to “guess” the threshold values of x = b(n) 
for which asymptotics (A8.2.2) changes to asymptotics (A8.2.4). To find such x it 
suffices to equate the logarithms of the right-hand sides of (A8.2.2) and (A8.2.4): 


2 
a =InnV(x) = Inn — Blnx + o(Inx). 
n 


The main part b(7) of the solution to this equation, as it is not hard to see, has the 
form 


b(n) = /(B —2)nInn 


(we exclude the trivial case n = 1). 
In what follows, we will represent deviations x as x = sb(n). Based on the above, 
it is natural to expect (and it can be easily verified) that the first term will dominate 


706 8 Bounds for Sums and Maximum of Sums 
on the right-hand side of (A8.2.5) if s < 1, while the second will dominate if s > 1. 
Accordingly, for small s (but such that x > ./n), we will have the above-mentioned 
exponential bounds for P(S, > x), while for large s there will hold bounds of the 
form nV (x) (note that nV (x) > 0 for x > b(n) and 6 > 2). 


The above claim is confirmed by the assertions below. Along with x introduce 
deviations 


where r > | is fixed, and put 
n 
Bj := {&j <y}, B= |B: 
j=l 


Theorem A8.2.1 Let conditions (A8.2.1) and |<] be satisfied. 
(1) For any fixed h > 1, so > 0, x = sb(n), s => sg and all IT :=nV(x) small 
enough, we have 


r-0 
P :=P(S, > x: B) ee (2) (A8.2.6) 
r 


where 
hr? Ins 2B 
IT(y) :=nV(y), 6 := —~| 1+ b—}, b:= ——. 
4s2 Inn 


(2) For any fixed h > 1, t > 0, for x = sb(n) > Jn, st<(h— t)/2, and all n 
large enough, we have 


P <e-* /Cnh), (A8.2.7) 
Corollary A8.2.1 (a) If s > oo then 
P(S, > x) <nV(x)(1+0(1)). (A8.2.8) 
(b) If s? > 3 for some fixed sq > 1 then, for all nV (x) small enough, 
P(S, >x) <cnV(x), c=const. (A8.2.9) 


(c) For any fixed h > 1, t > 0, for st<(h— T)/2, x > Jn, and all n large 
enough, 


PG, =x) <e7t On), (A8.2.10) 


Remark A8.2.1 It is not hard to verify (see the proofs of Theorem A8.2.1 and Corol- 
lary A8.2.1) that there exists a function e(t) | 0 as t + co such that one has, along 


8.2 Upper Bounds when the Cramér Condition Is Not Met 7107 
with (A8.2.8), the relation 


PG, > x) 


sup ———— <1+e(f). 
ee nV (x) © 


Proof of Corollary A&.2.1 The proof is based on the inequality 


PS, > x) < P(B) + P(S, > x; B) <nV(y) + P. (A8.2.11) 


Since 0 — 0 as s > oo, we see that, for any fixed ¢ > 0 and all JT =nV(x) small 
enough, we have P <c(nV(y))’ ©. Putting r := 1 + 2¢, we obtain from (A8.2.11) 
and (A8.2.6) that 


P(S, > x) <nV(y) +c(nV(y)) © ~ nC + 26) F VX). 


Since the left-hand side of this inequality does not depend on ¢, relation (A8.2.8) 
follows. 

We now prove (b). If s — oo then (b) follows from (a). If s is bounded then 
necessarily n — oo (since nV (x) — 0) and hence 


hr? Ins 
r—@0=r——~|(14+b)—)=Vv0,5)+0(), 
4s2 Inn 
where the function 
(ear 
r,S)i=r—-— 
4s2 


attains its maximum (ro, 5) = s2 /hinr at the point ro = 252 /h. Moreover, w(r, 5) 
strictly decreases in s. Therefore, for ro = 2s? / h, we obtain 


2 

u(r, 5) = (A8.2.12) 
s2 

ry-9 = —+0(1)_ asn— oo. (A8.2.13) 


Choose / so close to 1 and t > O so small that h + 7 < ie Putting r := ro, for 
s*> a5 >h+t andasn— oo, we get from (A8.2.6), (A8.2.12) and (A8.2.13) that 


P(S, > x) <nV(y)+ c(nV(y)) 7? ~ nv(*) ~ rPnV(x). 
0 


This proves (b). 
Relation (c) for y = x follows from the inequality (see (A8.2.7) and (A8.2.11)) 


PG, > x) <nV(x) $e), (A8.2.14) 


708 8 Bounds for Sums and Maximum of Sums 
where, for s? < (h — t)/2, 


2 
ex (2nh) 


h— —2)n1 
> exp ( we vane > n~B-2/4 
2 2nh 
On the other hand, we have x > ./n, 
nV (x) <nV(/n) =n PMPTA(n), 


where L* is a s.v.f. Therefore the second term dominates on the right-hand side 
of (A8.2.14). Slightly changing h if necessary, we obtain (c). Corollary A8.2.1 is 
proved. 


Remark A8.2.2 One can see from the proof of the corollary that the main contribu- 
tion to the bound for the probability P(S,, > x) under the conditions of assertions 
(a) and (b) comes from the event B = {max j<n §; = y} with y close to x, so that 
the most probable trajectory of {5;};_, that reaches the level x contains at least one 
jump &; of size comparable to x. 


Proof of Theorem A8.2.1 In our case, the Cramér condition [C] is not met. In order 
to use Theorem A8.1.1 in such a situation, we introduce “truncated” random vari- 
ables with distributions that coincide with the conditional distribution of € given 
{€ < y} for some level y the choice of which will be at our disposal. Namely, we 
introduce independent identically distributed random variables ae PS 2eiey 
with the distribution function 


P(g” <1) =PE <1 <y)= —, <y, 
and put 
é 
s® _ - ae ee ae max cae 
j=l = 
Then 


P=PG6,>x,B)=(PE < y))"P(S”” =x): (A8.2.15) 


Applying Theorem A8.1.1 to the variables > ” we obtain that, for any 4 > 0, 


P(S\” > x) <e*[max{1, Ee" }]". 
Since 
_ RO,y) 
F(y) ° 
we arrive at the following basic inequality. For x, y, A > 0, 


Eee” 


y 
where R(A, y) := / e'F(dt), 


[ee 


P=P(,>x, B) <e**[max{P(E < y), RA, y)}]" 


8.2 Upper Bounds when the Cramér Condition Is Not Met 709 
<e~™ max{1, R"(A, y)}. (A8.2.16) 
Thus, the main problem is to bound the integral R(A, y). Put 
Mw) := 
v= = 
Xr 
and represent R(A, y) as 


RQ, yy=h+h, 


where, for a fixed ¢ > 0, 


Me) Me) Wan O(t 
i = ena = | (14204 See ran, 0< a2 i 


—oO —0C 
(A8.2.17) 
Here 
M(e) 
i F(dt) = 1—V(M(e)) <1, 
—0o 
M(e) 0° 
/ tF(dt) = — / tF(dt) <0, (A8.2.18) 
—0o M(e) 
Me) Me) 
/ re OR) < & / t?F(dt) < e® =:h. (A8.2.19) 
—oo —oo 
Therefore, 
7h 
fei a (A8.2.20) 
Estimate now 
ay: y 
h =f e“ dF, (t) < v(meje' +2 f V(tje™ dt. (A8.2.21) 
Mce) Me) 


First consider, for M(e) < M(2B) < y, the subintegral 


M(28) 
bi if V(t)e™ dt. 
M(e) 


For tf = v/A, as A > 0, we have 


Vine = v(Z)e ae v(;) Fe, (A8.2.22) 


where the function 


fv) :=v Pe? 


710 8 Bounds for Sums and Maximum of Sums 


is convex on (0, co). Therefore 


a 1 1 
hi< =(m2p)—Me))v(+)(Fe)+£26)(1+000) < cv(2). (A8.2.23) 


We now proceed to estimating the remaining subintegral 
y 
Ih2:= af V(t)e™ dt. 
M(2B) 


For brevity’s sake, put M(28) =: M. We will choose A so that 
ba=ayroouw (y>1/a) (A8.2.24) 


as x — oo. Substituting the variable (y — t)A =: u we obtain 


(y-M)r u 
Alo2 = eve) f v(> — *) vo! (yje “du. (A8.2.25) 
0 


Consider the integral on the right-hand side of (A8.2.25). Since 1/A < y, the inte- 
grand 


VQy—u/A) 
Vi) 


converges to | for each fixed wu. In order to use the dominated convergence theorem 
which implies that the integral on the right-hand side of (A8.2.25) converges, as 
y—> ow, to 


ry,a(Uu) = 


CO 
i e“du=1, (A8.2.26) 
0 


it remains to estimate the growth rate of the function ry, ,(u) as u increases. By the 
properties of r.v.f.s (see Theorem A6.2.1(iii) in Appendix 6), for all A small enough 
(or M large enough; recall that y—u/A > M in the integrand in (A8.2.25)), we have 


a \ 28? 
ry,,(u) < (1 = a) =! g(u). 


Since g(0) = 1 anday —u > Md = 26, in this domain 


/_ 36 38 _3 
~ 2Way—u) 4B 4’ 


(In g(u)) 


3 
Ing(u) < oa ry a(u) < e8t/4, 


This means that the integrand in (A8.2.25) is dominated by the exponential e~“/*, 


and the use of the dominated convergence theorem is justified. Therefore, due to the 


8.2 Upper Bounds when the Cramér Condition Is Not Met 711 


convergence of the integral in (A8.2.25) to the limit (A8.2.26), we obtain 


lo) 
Alo ~ even f e“ du=e"V(y), 
0 


and it is not hard to find a function e(u) J) 0 as fu t oo such that 
Ala. <e V(y)(1 + e(u)). (A8.2.27) 


Summarising (A8.2.20)-(A8.2.23) and (A8.2.27), we obtain 


rh 1 x 
Ro.) < 145" 4 ev(5) $V) (146), (A8.2.28) 


: nh 1 a 
RQ, y) < exp] + onv(>) +nV(yje (1+ ow). (A8.2.29) 


First take A to be the value 


1 
A=-InT 
y 


that “almost minimises” the function —Ax + nV(y)e*”, where T := AVG)" so 


that u = InT. Note that, for such a choice of jz (or of A = yl In(r/IT(y))), for 
IT(y) > 0 we have that uw = Ay ~ —InIT(y) — o and hence that the assumption 
y > 1/4 we made in (A8.2.24) holds true. For such 4, 


nd?h 1 
RG.) sexp{" ton (>) +r(1+ew)}, (A8.2.30) 


where, by the properties of r.v.f.s, 


1 y y B+ 
V{ —)~nVv{ — |) ~cnv( ———— ] <cnV(y)|Innv 0, 
" (;,) " (a7) (avon) =” ON = 


5s, (A8.2.31) 


as nV (y) — O. Therefore 
nh | 5 
InP <—rlnT +r+—JIn’T + 61(T) 
2y 
nh 
— |-r + 42 In r| In7T+r+e,(T), (A8.2.32) 
y 


where €1(T) | 0 as T t oo. If x = sb(n), b(n) = /(6 — 2)n Inn, and nV (x) > 0 


then 


InT = —InnV(x) + O() =—Inn+ Blns+ Inn + O(InL(so(n))) + O(1) 


712 8 Bounds for Sums and Maximum of Sums 


p=? Ins 
= 5 inal 1-0 JL +001), (A8.2.33) 


where b = raat (the term o(1) in the last equality appears because in our case either 
n— oo or s > ov.) Hence, by (A8.2.32), 


nh inT hr? Lkp Ins (14 (1)) 
— InT = — — o(1)), 
2y? 4s2 Inn 


InP < HT (a) 2 line 

n r—|r—- — }|In 

~ 4s? Inn 

for any h’ > h > 1 and nV(x) small enough. This proves the first assertion of the 
theorem. 


We now prove the second assertion of the theorem for “small” values of s such 
that, for some t > 0, 


Since we always assume that x > ./n, we also have 
x 1 


*~ ba)” J/GoDInn 


and we can assume that s? > n~” for some y > 0 to be chosen below. This corre- 
sponds to the following domain of the values of x?: 


2 (h—1)(B —2) 
nn 1) 


cn'-Y Inn <x? < A 


Inn. (A8.2.34) 


For such x, as will be shown below, the main contribution to the exponent on the 
right-hand side of (A8.2.29) comes from the quadratic term n*h/2, and we will set 


A=. 
nh 


Then, for y=x (r= 1, p = x7/(nh)), 


nh 1 i 
InP <—Ax+—— +enV ‘ +nV(y)e Y(1+e(u)) 


x2 


2nh 


nh x2 
+onv(**) +nV(x)em (1 + e(u)). (A8.2.35) 
x 


We show that the last two terms on the right-hand side are negligibly small as 
n — oo. Indeed, by the second inequality in (A8.2.34), 


nh n 
nv(™) < env ( [~) >0 asn>-ow. 
x Inn 


8.3 Lower Bounds 713 
Further, by the first inequality in (A8.2.34), 
nV (x) < ne-B)/2+y" 
where we can choose y’. Moreover, by (A8.2.34), 
x7 (h—1)(B—2)Inn B-2 t(B-—2) 
< — Inn. 


nh ~ 2h 2 2h 
Therefore 
nV (xjet (Oh) < n-tB-D/AM+Y' _, 9 
for y’ < r~?) as n —> ©. 
Thus, 
x2 
InP < -—— +0(]1). 
a ) 


Since x7/n > 1, the term o(1) in the last relation can be omitted by slightly chang- 

ing h > |. (Formally, we proved that, for any h > | and all n large enough, inequal- 

ity (A8.2.7) is valid with the / on its right-hand side replaced with h’ > h, where we 

can take, for instance, h’ = h + (h — 1)/2. Since h’ > 1 can also be made arbitrarily 

close to 1 by the choice of h, the obtained relation is equivalent to the one from 

Theorem A8.2.1.) This proves (A8.2.7). 
The theorem is proved. 


Comparing the assertions of Theorem A8.2.1 and Corollary A8.1.1, we see that, 
roughly speaking, for s < 1/2 and for s > 1 one can obtain quite satisfactory and, in 
a certain sense, unimprovable upper bounds for the probabilities P and P(S;, > x). 


8.3. Lower Bounds 


In this section we will again assume that conditions (A8.2.1) are satisfied. The lower 
bounds for P(S, > x) (they will clearly hold for P(S,, > x) as well) can be obtained 


in a much simpler way than the upper bounds and need essentially no assumptions. 


Theorem A8.3.1 Let E&; = 0 and RE? = 1. Then, fory=x+tVn—1, 


P(S, > x) > nF -1?- ut 0) (A8.3.1) 


Proof Put Gy := {Sp = x} and Bj := {&; < y}. Then 


P(S, >x)> r( 60 U #,) > °P(G,Bj)— )~ P(G,B;B;) 


j=l j=l i<jsn 


714 8 Bounds for Sums and Maximum of Sums 


> \ PG, B;) —* F2(y). 


j=l 


(n—1) 
2 


Here, for y= x+t/n—1, 


P(G,Bj) = [rs =x — u)F(du) > P(Sn-1 > x — y)Fy(y) 


y 
= P(S,_-1 > —tVn — 1) Fy (y) = (1 — P(Sp_-1 < -tVn — 1)) Fi. ©, 
where, by the Chebyshev inequality, 
P(Sp—1 < —tV/n—1) <1”. 


As a result we get 


P(S, > x) >nFy()(1-1-’) 


which is equivalent to (A8.3.1). 
The theorem is proved. 


Corollary A8.3.1 If x > 00 and x > \/n then, as t > , 
P(S, => x) =nF+(y)(1+0(1)). (A8.3.2) 
If, moreover, F+(u) > V(u) € 8 then 


P(S, > x)= nV(x)(1 + o(1)). 


Proof Since y > x, we have 


2 enx = o(1). 


nFy(y)<ny~ 

This together with (A8.3.1) implies the first assertion of the corollary as t + oo. To 

obtain the second one, in (A8.3.2) one should take t — oo such that t = o(x/,/n). 

Then y ~ x and V(y) ~ V(x). 
The corollary is proved. 


Appendix 9 
Renewal Theorems 


The main goal of the present section is to prove Theorem 10.4.1, the key renewal 
theorem in the non-arithmetic case (in the terminology of Chap. 10). We will also 
consider some refinements and extensions of the theorem. 

First consider positive independent identically distributed random variables 


d Ba saimgs 9 . : ae 
Tj =T with distribution function F and finite mean a := Et < oo. Here it will be 
more convenient to understand by the renewal function its left-continuous version 


(oe) 
Ha FP), £20, 
k=0 
where F** is the k-fold convolution of the distribution F with itself, which is the 


distribution function of the sum 7; = Tt; +---+ tx. We first prove the following key 
assertion. 


Theorem A9.1 /f g is a directly integrable function and t; are non-arithmetic (see 
Chap. 10) then, as t > o, 


t fore) 
[ se-wanw-f ede: 
0 a Jo 


The proof of the theorem mostly follows the argument suggested in [13] and will 
need several auxiliary assertions. 


Lemma A9.1 Let g be a bounded measurable function. The integral 


t 
c= | g(t —u)dH(u) =: g * H(t) (A9.1) 
0 


is the unique solution of the equation 


A.A. Borovkov, Probability Theory, Universitext, 715 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


716 9 Renewal Theorems 


t 


G(t)=g(t) + / G(t —u)dF(u) =g) +G* FQ) (A9.2) 
0 


in the class of functions bounded on finite intervals. 
The function G = H is the solution of (A9.2) when g = 1. The function G = | is 
the solution of (A9.2) when g=1-— F. 


Equation (A9.2) is called the renewal equation. 

As we already noted in Theorem 10.4.1, one can associate, in an obvious way, 
measures H and F with the functions H and F, and write the integrals in (A9.1) and 
(A9.2) as integrals with respect to the measures: 


t t 
/ g(t—u)H(du) and 7 G(t — u)F(du), 
0 0 
respectively. 


Proof of Lemma A9.1 Put 


H,,(t) := PMO: 


k=0 


The functions G, = g * H, satisfy the equation Gj+) = g + G, * F and form an 
increasing sequence G,, + which is bounded by Lemma 10.2.3. Therefore G, t G, 
and passing to the limit in the equation for G, we obtain that G satisfies (A9.1). 
To prove uniqueness note that the difference V = G“) — G) of two solutions G“) 
and G) must satisfy the homogeneous equation V = V « F and therefore also the 
relations V = V « (F**) or, which is the same, 


t 
V(t) = V(t —u)dF*(u). 
0 


But F**(u) — 0 as k > oo for u € [0, ft]. Since by the assumption |V(u)| <c on 
[0, t], we have V(t) — 0 as k > oo. But V does not depend on k, so that V(t) =0. 
The last assertion of the lemma can be verified directly. The lemma is proved. 


Note that if we considered functions g of bounded variation, the assertion of 
Lemma A9.1 would immediately follow from the equation for the Laplace-Stieltjes 
transform G(A) = jo. e~™ dG(t) of G which follows from (A9.2): 

GA) =A) + GW), (A9.3) 


where 


g(a) := fog dg(t), WA) = [aro. 
0 0 


9 Renewal Theorems 717 


Indeed, it follows from (A9.3) that 
g(A) 


G(r) = fay 


which is equivalent to (A9.1). 
A point ¢ is said to be a point of growth of the distribution function F provided 
that F(t + ¢) — F(t) > 0 for any e > 0. 


Lemma A9.2 Let the distribution F be non-arithmetic and Z be the set of all points 
of growth of H, i.e. points of growth of the functions F, F**, F*?,.... Then Z is 
“asymptotically dense at infinity”, i.e., for any given € > 0 and all x large enough, 
the intersection (x, x + &€)M Z is non-empty. 


Proof Observe first that if t; is a point of growth of the distribution F; of a random 
variable tT, and f2 is a point of growth of the distribution Fz of a random variable ¢ 
which is independent of t, then tf = fy + f2 will be a point of growth of the distribu- 
tion F| « F> of the variable t + ¢. Indeed, 


Poster <tte)2P(sr<nts)P(nst<n+5). 


Let, further, x < y be two points of the set Z, and A := y — x. The following 
alternative takes place: either 

(1) for any ¢ > 0 there exist x and y such that A < ¢, or 

(2) there exists a 6 > 0 such that A > 6 for all x and y from Z. 

Put J, := [xn, yn]. If nA > x then that interval contains [nx,(n + 1)x] as a 
subset, and therefore any point v > vg = x*/A belongs to at least one of the intervals 
i, h,.... 

By virtue of the above observation, the n+ 1 pointsnx+kA = (n—k)x+ky,k= 
0,...,”, belong to Z and divide J, into n subintervals of length A. This means that, 
for any point v > vo, the distance between v and the points from Z is at most A/2. 

This implies the assertion of the lemma when (1) holds. 

If (2) is true, we can assume that x and y are chosen so that A < 26. Then the 
points of the form nx + kA exhaust all the points from Z lying inside J,,. Since the 
point (n + 1)x is among these points, the value x is a multiple of A, and all the 
points of Z lying inside /,, are multiples of A. Now let z be an arbitrary point of 
growth of F’. For sufficiently large n, the interval /,, contains a point of the form 
z+kA, and since the latter belongs to Z, the value z is also a multiple of A. Thus 
F is an arithmetic distribution, so that case (2) cannot take place. The lemma is 
proved. 


Lemma A9.3 Let g(x) be a bounded uniformly continuous function given on 
(—00, ©) such that, for all x, q(x) < q(O) for all x, and 


a= [ q(x — y)dF(y). (A9.4) 


Then q(x) =q(0). 


718 9 Renewal Theorems 


Proof Equation (A9.4) means that q = q * F =---=q * F** for all k > 1. The 
right-hand side of (A9.4) does not exceed q(0), and hence, for x = 0, the equality 
(A9.4) is only possible if g(—y) = q(0) for all y € Z,, where Z; is the set of points 
of growth of F**, and therefore g(—y) = q(0) for all y € Z. By Lemma A9.2 and 
the uniform continuity of g this means that g(—y) — q(O) as y > ov. Further, 
for an arbitrarily large N we can choose k such that q(x) will be arbitrarily close 
to i. q(x — y)dF**(y), since F**(N) > 0 as k > oo. This means, in turn, that 
q(x) will be close to q(0). Since q(x) depends neither on N nor on k, we have 
q(x) = q(0). The lemma is proved. 


Lemma A9.4 Let g be a continuous function vanishing outside segment [0, b]. Then 
the solution G of the renewal equation (A9.2) is uniformly continuous and, for 
any Uu, 


G(x +u) — G(x) > 0 (A9.5) 


as X > ©. 


Proof By virtue of Lemma 10.2.3, 


x+6 
|G@ +4) — G(x)| = if (ga +5-—y)—g@—y))dH(y) 


c—b 


S ymax. |g +8) — g@)|(cr + ex +8). (A9.6) 


This means that the uniform continuity of g implies that of G. 
Now assume that g has a continuous derivative g’. Then G’ exists and satisfies 
the renewal equation 


G'() = eG) + [ G(x — y)dF(y). 


Therefore the derivative G’ is bounded and uniformly continuous. Let 


lim sup G’(x) =s. 


X> CO 


Choose a sequence t, — oo such that G’(t,) > s. The family of functions g, de- 
fined by the equalities 


n(x) = G' (tm +x) 
is equicontinuous, and 


X+tn 


goed ++ f Qn(x — y) dF (y) = 8'(th ++ [ aQn(x — y)dF(y). 


(A9.7) 


9 Renewal Theorems 719 


By the Arzela—Ascoli theorem (see Appendix 4) there exists a subsequence fy, 
such that gn, converges to a limit g. From (A9.7) it follows that this limit satis- 
fies the conditions of Lemma A9.3, and therefore g(x) = q(0) = s for all x. Thus 
G' (th, +x) > s for all x, and hence 


G(th, +x) — G(tn,) > sx. 


Since the last relation holds for any x and the function g is bounded, we get s = 0. 

We have proved the lemma for continuously differentiable g. But an arbitrary 
continuous function g vanishing outside [0, b] can be approximated by a continu- 
ously differentiable function g; which also vanishes outside that interval. Let G, 
be the solution of the renewal equation corresponding to the function g;. Then 
lg —g1| < € implies |G — G\| < ce, c=c1 +c2b (see Lemma 10.2.3), and therefore 


|G@ +u) — G(x)| < c+ De 


for all sufficiently large x. This proves (A9.5) for arbitrary continuous functions g. 
The lemma is proved. 


Proof of Theorem A9.1 Consider an arbitrary sequence ft, — oo and the measures 
ft, generated by the functions 


Hn) (u) = A(t, +u) — H(t) (Mn ([u, v)) = Hny(v) _ Hn) (u)). 


These functions satisfy the conditions of the generalised Helly theorem (see Ap- 
pendix 4). Therefore there exists a subsequence fy, the respective subsequence of 
measures #,,,, and the limiting measure mw such that ,,,, converges weakly to w on 
any finite interval as n > oo. 

Now let g be a continuous function vanishing outside [0, b]. Then 


0 
ees, =f Sean aed dct) 
b 


0 b 
=) (1) d(H tm +x +0) — Het) > f g(u)we + du). 
b 0 


By Lemma A9.4, the sequence G(ty» + y) will have the same limit. This means that 
the measure f4(x + du) does not depend on x, and therefore w([u, v)) is proportional 
to the length of the interval (u, v): 


w((u,v))=c(v—u), (du) =cdu. 


Thus, we have proved that 


G(tan +x) > of” g(u) du (A9.8) 
0 


720 9 Renewal Theorems 


for any continuous function g vanishing outside [0, b]. But for any Riemann inte- 
grable function g on [0, b] and given ¢ > 0 there exist continuous functions g; and 
82, 81 < 8 < g2, which are equal to 0 outside [0, b + 1] and such that 


b 
/ eee 
0 


This means that convergence (A9.8) also holds for any Riemann integrable function 
vanishing outside [0, b]. 

Now consider an arbitrary directly integrable function g. By property (2) of such 
functions (see Definition 10.4.1) one can choose a b > 0 such that for the function 


gtu) ifu<b, 


sou) = f ifu > b, 

the left- and right-hand sides of (A9.8) will be arbitrarily close to the respective 
expressions corresponding to the original function g (for the right-hand side it is 
obvious, while for the left-hand side it follows from the convergence 


t 
if esd) ~ f goes) dH) 


< Yo ci ter)ge > 0 
k>b-1 


t—b 
= f g(t —s)dH(s) 
0 


as b + oo (see Lemma 10.2.3)). Therefore (A9.8) is proved for any directly inte- 
grable function g. Putting g := 1 — F we obtain from Lemma A9. 1 


tse [™ (1-Fw)du=ac, poms 
0 


a 


Thus the limit in (A9.8) is one and the same for any initial sequence f,,. From this it 
follows that, as tf > oo, 


G(t) => -{[- g(u) du. 
a Jo 


The theorem is proved. 


Theorem 10.4.1 is a simple consequence of Theorem A9.1 and the argument 
used in the proof of Theorem 10.2.3 that extends the key renewal theorem in the 
arithmetic case was extended to the setting where t;, j = 2, can assume values 
of different signs, while t, is arbitrary. We will leave it to the reader to apply the 
argument in the non-arithmetic case. 

Now we will give several further consequences of Theorem A9.1. In Sect. 10.4 
we obtained a refinement of the renewal theorem in the case when m2 := 
Ex? < oo. Approaches developed while proving Theorem A9.1 enable one to obtain 
an alternative proof of the following assertion coinciding with Theorem 10.4.4. 


9 Renewal Theorems 721 


Theorem A9.2 Let the conditions of Theorem A9.1 be met and m2 < o. Then 


t m2 
0< H(t) >—5 asi oo. 
a 2a 


Proof The function G(t) := H(t) — t/a is the solution of the renewal equation 
(A9.2) corresponding to the function 


1 CO 
g(t) == f (1 — F(u)) du. 
a Jt 


Since g is directly integrable, we have 


m 


1 CO [o.@) 
cof / (1 Fw) dudv = 5. 


The theorem is proved. 


Theorem A9.3 (The local renewal theorem for densities) Assume that F has a den- 
sity f = F’ and this density is directly integrable. Then H has a density h = H', 
and 


1 
h(t) > - ast>o. 
a 


Proof Denote by f,(x) the density of the sum 7, = tT; +--- + T,. We have 


nO) =H =o fult) = FO + f ht —w fapdu= FO) + hx FO. 


n=1 


This means that h(t) satisfies the renewal equation with the function g = f. There- 
fore by Theorem A9.1, 


1s” 1 
n> = | f(u)du=-—- 
a Jo 


a 


The theorem is proved. 
Consider now some extensions of Theorem A9.1. A function g given on the 
whole line (—oo, co) is said to be directly integrable if both functions g(t) and 


g(—t), t > 0, are directly integrable. 


Theorem A9.4 [f the conditions of Theorem A9.1 are met and g is directly inte- 
grable, then 


c= f° ge-wadu > ~ [ surau ee 
0 a J—oo 


722 9 Renewal Theorems 


The Proof can be obtained by making several small and quite obvious modifica- 
tions to the argument in the demonstration of Theorem A9.1. The main change is 
that instead of functions g vanishing outside [0, b] one should now consider func- 
tions vanishing outside [—b, b]. 

Another extension refers to the second version of the renewal function 


CO 
U(t):= oe a —0 <t<o, 
k=0 


in the case when t; can assume values of different signs. 


Theorem A9.5 If g is directly integrable and Et; = a > 0, then 


co=[ g(t —u)U(du) > -f- g(u)du- ast > ov, 
0° a J—oo 


and, for any fixed u, U(t +u) — U(t) > Oast > ow. 


The proof is also obtained by modifying the argument proving Theorem A9. 1 
(see [13]). 


References 


= SS 


Se SON 


Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968) 

Billingsley, P.: Probability and Measure. Anniversary edn. Wiley, Hoboken (2012) 
Borovkov, A.A.: Stochastic Processes in Queueing Theory. Springer, New York (1976) 
Borovkov, A.A.: Convergence of measures and random processes. Russ. Math. Surv. 31, 1-69 
(1976) 

Borovkov, A.A.: Probability Theory. Gordon & Breach, Amsterdam (1998) 

Borovkov, A.A.: Ergodicity and Stability of Stochastic Processes. Wiley, Chichester (1998) 
Borovkov, A.A.: Mathematical Statistics. Gordon & Breach, Amsterdam (1998) 

Borovkov, A.A., Borovkov, K.A.: Asymptotic Analysis of Random Walks. Heavy-Tailed Dis- 
tributions. Cambridge University Press, Cambridge (2008) 

Crameér, H., Leadbetter, M.R.: Stationary and Related Stochastic Processes. Willey, New York 
(1967) 

Dudley, R.M.: Real Analysis and Probability. Cambridge University Press, Cambridge (2002) 


11. Feinstein, A.: Foundations of Information Theory. McGraw-Hill, New York (1995) 

12. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 1. Wiley, New 
York (1968) 

13. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New 
York (1971) 

14. Gikhman, I.I., Skorokhod, A.V.: Introduction to the Theory of Random Processes. Saunders, 
Philadelphia (1969) 

15. Gnedenko, B.V.: The Theory of Probability. Chelsea, New York (1962) 

16. Gnedenko, B.V., Kolmogorov, A.N.: Limit Distributions for Sums of Independent Random 
Variables. Addison-Wesley, Reading (1968) 

17. Gradsteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products. Academic Press, New 
York (1965). Fourth edition prepared by Ju.V. Geronimus and M.Ju. Ceitlin. Translated from 
the Russian by Scripta Technica, Inc. Translation edited by Alan Jeffrey 

18. Grenander, U.: Probabilities on Algebraic Structures. Almqvist & Wiskel, Stockholm (1963) 

19. Halmos, P.R.: Measure Theory. Van Nostrand, New York (1950) 

20. Ibragimoy, I.A., Linnik, Yu.V.: Independent and Stationary Sequences of Random Variables. 
Wolters-Noordhoff, Croningen (1971) 

21. Khinchin, A. Ya.: Ponyatie entropii v teorii veroyatnostei (The concept of entropy in the theory 
probability). Usp. Mat. Nauk 8, 3—20 (1953) (in Russian) 

22. Kifer, Yu.: Ergodic Theory of Random Transformations. Birkhauser, Boston (1986) 

23. Kolmogorov, A.N.: Markov chains with a countable number of possible states. In: Shiryaev, 
AN. (ed.) Selected Works of A.N. Kolmogorov, vol. 2, pp. 193-208. Kluwer Academic, Dor- 
drecht (1986) 

A.A. Borovkov, Probability Theory, Universitext, 723 


DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


724 References 


24. Kolmogorov, A.N.: The theory of probability. In: Aleksandrov, A.D., et al. (eds.) Mathematics, 
Its Content, Methods, and Meaning, vol. 2, pp. 229-264. MIT Press, Cambridge (1963) 

25. Lamperti, J.: Probability: A Survey of the Mathematical Theory. Wiley, New York (1996) 

26. Loeve, M.: Probability Theory. Springer, New York (1977) 

27. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, New York 
(1993) 

28. Natanson, I.P.: Theory of Functions of a Real Variable. Ungar, New York (1961) 

29. Nummelin, E.: General Irreducible Markov Chains and Nonnegative Operators. Cambridge 
University Press, New York (1984) 

30. Petrov, V.V.: Sums of Independent Random Variables. Springer, New York (1975) 

31. Shiryaev, A.N.: Probability. Springer, New York (1984) 

32. Skorokhod, A.V.: Random Processes with Independent Increments. Kluwer Academic, Dor- 
drecht (1991) 

33. Tyurin, I.S.: An improvement of the residual in the Lyapunov theorem. Theory Probab. Appl. 
56(4) (2011) 


Index of Basic Notation 


Spaces and o -algebras 
§—a o-algebra, 14 
($2, §)—a measurable space, 14 
R—the real line, 17 
R”—n-dimensional Euclidean space, 18 
%8—the o-algebra of Borel-measurable subsets of R, 17 
%8”—the o-algebra of Borel-measurable subsets of R”, 18 
(2, §, P)—the probability space, 17 
(Note that §2 and § can take specific values, i.e. R and 8, respectively.) 


Distributions! 
F;, F—the distribution of the random variable €, 32, 32 
I,— the degenerate distribution (concentrated at the point a), 37 
U,,p—the uniform distribution on [a,b], 37 
B,, B,—the binomial distributions, 37 
multinomial distributions, 47 
®, .2—1the normal (Gaussian) distribution with parameters (a, a”), 37,48 
Py,o2(x)—the density of the normal law with parameters (a, o*), 41 
F,)—the stable distribution with parameters 6, 0, 231, 233 
f P-)(~)—the density of the stable distribution with parameters Fg, 235 
gp) (t)—the characteristic function of distribution Fg, 231 
Ky ~—the Cauchy distribution with parameters (a,0), 38 
T',—the exponential distribution with parametera, 38, 177 
Ty,.,—the gamma-distribution with parameters (a@,), 176 
TI,—the Poisson distribution with parameter 2, 39 
x’—1the x?-distribution, 177 
A(a)—the large deviation rate function, 244 


1(All distributions and measures are denoted by bold letters). 


A.A. Borovkov, Probability Theory, Universitext, 725 
DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


726 Index of Basic Notation 


Relations 

:= means that the left-hand side is defined by the right-hand side, xi 

=: means that the right-hand side is defined by the left-hand side, xi 

~ notation dn ~ bn (a(x) ~ b(x)) means that limy oo $4 = 1 (limy oo fF = 1), 
109 

+, convergence of random variables in probability, 129 


JS, 


—+—almost sure convergence of random variables, 130 

©, convergence of random variables inthe mean, 132 

4 notation é a 7 means that the distributions of € and 7 coincide, 144 
2 igen é £ y means that P(E >t) < P(y >1¢) forallt, 302 
Sein é 2 n means that P(E >t) >P() >) forallt, 302 


©—-notation &, & F means that € has the distribution F, 36 

&, <> F means that the distribution of &, converges weakly toF, 144 

=>— relation F,, = F means weak convergence of the distributions F,, to F, 141, 
for random variables €, = € means that F, => F, where é, E€F,,€ EF, 143 


Conditions 
[C]—the Cramér condition, 240 
[Rg,o]|—conditions of convergence to the stable law Fg.,, 229 


Subject Index 


A 
Abelian theorem, 673 
Absolutely continuous distribution, 40 
Absorbing state, 393 
Absorption, 391 
Algebra, 14 
Almost invariant 

random variable, 498 

set, 497 
Amount of information, 448 
Aperiodic Markov chain, 419 
Arithmetic distribution, 40 
Arzela—Ascoli theorem, 657 
Asymptotically normal sequence, 187 
Atom, 419 

positive, 420 


B 
Basic coding theorem, 455 
Bayes formula, 27 
Bernoulli scheme, local limit theorems for, 113 
Bernstein polynomial, 109 
Berry—Esseen theorem, 659 
Beta distribution, 179 
Binomial distribution, 37 
Bochner—Khinchin theorem, 158 
Borel 
o-algebra, 15 
set, 15 
Branching process, 180, 591 
extinction of, 182 
Brownian motion process, 549 


C 

Carathéodory theorem, 19, 622 

Cauchy sequence, 132 
Cauchy—Bunjakovsky inequality, 87, 97 


A.A. Borovkov, Probability Theory, Universitext, 


Central limit theorem, 187 

for renewal processes, 299 
Central moment, 87 
Chain, Markov, 390, 414 
Chapman—Kolmogorov equation, 582 
Characteristic function, 153 

for multivariate distribution, 171 
Chebyshev inequality, 89, 96 

exponential, 248 
Chi-squared distribution, 177 
Class 

of distributions 

exponential, 373 
superexponential, 373 

of functions, distribution determining, 148 
Coefficient 

diffusion, 604 

shift, 604 
Common probability space method, 118 
Communicating states, 392 
Complement, 16 
Completion of measure, 624 
Component, factorisation, 334 
Compound Poisson process, 552 
Condition 

Cramér, 240, 703 

Cramér on ch.f., 217 

[D;], 188 

(D2), 199 

Lyapunov, 202, 560 

[Rg,p], 229, 687 
Conditional 

density, 100 

distribution, 99 

distribution function, 70 

entropy, 451 

expectation, 70, 92, 94, 95 


727 


DOI 10.1007/978-1-4471-5201-9, © Springer-Verlag London 2013 


728 Subject Index 


Conditional (cont.) Erlang, 177 

probability, 22, 95 exponential, 38, 71, 177 
Consistent distributions, 530 finite-dimensional, 528 
Continuity axiom, 16 function, 32 
Continuity theorem, 134, 167, 173 conditional, 70 
Converge properties, 33 

in measure, 630 gamma, 176 
Convergence Gaussian, 37 

almost everywhere, 630 geometric, 38 

almost surely, with probability 1, 130 infinitely divisible, 539 

in distribution, 143 invariant, 404, 419 

in measure, 630 lattice, 40 

in probability, 129 Levy, 235 

in the mean, 132 multinomial, 47 

in total variation, 653 multivariate normal (Gaussian), 48, 173 

weak, 141, 173, 649 non-lattice, 160 
Correlation coefficient, 86 normal, 37 
Coupling method, 118 of process, 528 
Covariance function, 611 of random process, 529 
Cramér of random variable, 32 

condition, 240, 703 Poisson, 26, 39 

on ch.f., 217 singular, 41, 325 

range, 256 stable, 233 

series, 248 stationary, 404, 419 

transform, 473 of waiting time, 350 
Crossing times, 237 subexponential, 376, 675 
Cumulant, 242 tail of, 228 
Cylinder, 528 uniform, 18, 37, 325 

uniform on a cube, 18 

D Dominated convergence theorem, 139 
Defect, 290 Donsker—Prokhorov invariance principle, 561 
Degenerate distribution, 37 Doubly stochastic matrix, 410 
De Moivre—Laplace theorem, 115, 124 
Density E 

conditional, 100 Element 

of distribution, 40 random, 649 

of measure, 642 Entropy, 448 

transition, 583 conditional, 451 
Derivative, Radon—Nikodym, 644 Equality 
Deviation, standard, 83 Parseval, 161 
Diffusion Equation 

coefficient, 604 backward (forward) Kolmogorov, 587, 605 

process, 603 Chapman—Kolmogorov, 582 
Directly integrable function, 293 renewal, 716 
Distance, total variation, 420 Equivalent 
Distribution, 17 processes, 530 

absolutely continuous, 40 sequences, 109 

arithmetic, 40 Ergodic 

beta, 179 Markov chain, 404 

binomial, 37 sequence, 498 

chi-squared, 177 state, 411 

conditional, 99 transformation, 498 

consistent, 530 Erlang distribution, 177 


degenerate, 37 Essential state, 392 


Subject Index 


Event, 2 
certain, 16 
impossible, 16 
random, xiv 
renovating, 509 
tail, 316 
Events 
disjoint (mutually exclusive), 16 
independent, 22 
Excess, 280 
Existence 
of expectation, 65 
of integral, 643 
Expectation, 65 
conditional, 70, 92, 94, 95 
existence of, 65 
Exponential 
Chebyshev inequality, 248 
class of distributions, 373 
distribution, 38, 177 
polynomial, 355, 366 
Extinction of branching process, 182 


F 
Factorisation, 334 
component, 334 
Fair game, 72 
Finite-dimensional distribution, 528 
First nonnegative sum, 336 
First passage time, 278 
Flow of o-algebras, 457 
Formula 
Bayes, 27 
total probability, 25 
Function 
covariance, 611 
directly integrable, 293 
distribution, 32 
properties, 33 
large deviation rate, 244 
locally constant, 373 
lower, 546 
rate, 244 
regularly varying, 266, 665 
renewal, 279 
sample, 528 
slowly varying, 228, 665 
subexponential, 376 
test (Lyapunov), 430 
transition, 582, 583 
upper, 546 


G 
Gamma distribution, 176 


Gaussian 

distribution, 37 

process, 614 
Generating function, 161 
Geometric distribution, 38 
Gnedenko local limit theorem, 221 


H 


729 


Hahn’s theorem on decomposition of measure, 


646 


Harris (irreducible) Markov chain, 424 


Helly theorem, 655 
Hélder inequality, 88 
Homogeneous 
Markov chain, 391, 416 
Markov process, 583 
process, 539 
renewal process, 285 


I 

Identity 
Pollaczek—Spitzer, 345 
Wald, 469 


Immigration, 591 
Improper random variable, 32 
Independent 
classes of events, 51 
events, 22 
random variables, 153 
trials, 24 
Indicator of event, 66 
Inequality 
Cauchy—Bunjakovsky, 87, 97 
Chebyshev, 89, 96 
Chebyshev exponential, 248 
Holder, 88 
Jensen, 88, 97 
Kolmogorov, 478 
Minkowski, 88, 133 
Schwarz, 88 
Inessential state, 392 
Infinitely divisible distribution, 539 
Information, 448 
amount of, 448 
Integrability, uniform, 135 
Integral, 630, 632, 642 


of a nonnegative measurable function, 632 


over a set, 631 
Integro-local theorems, 216 
Invariance principle, 567 
Invariant 

distribution, 419 

random variable, 498 

set, 497 


730 


Irreducible Markov chain, 393 
Iterated logarithm, law of, 545, 546, 568 


J 
Jensen inequality, 88, 97 


K 

Karamata theorem, 668 

Kolmogorov 
equation, backward (forward), 587, 605 
inequality, 478 
theorem on consistent distributions, 56, 625 


L 
Laplace transform, 156, 241 
Large deviation 
probabilities, 126 
rate function, 244 
Large numbers, law of, 107, 188 
for renewal processes, 298 
strong, 108 
Lattice distribution, 40 
Law 
of iterated logarithm, 545, 546, 568 
of large numbers, 90, 107, 188 
for renewal processes, 298 
strong, 108 
Lebesgue theorem, 644 
Legendre transform, 244 
Levy distribution, 235 
Limit theorems, local for Bernoulli scheme, 
113 
Linear prediction, 617 
Local limit theorem, 219 
Locally constant function, 373 
Lower 
function, 546 
sequence, 318 
Lyapunov condition, 202, 560 


M 
Markov 
chain, 390, 414, 585 
aperiodic, 419 
ergodic, 404 
Harris (irreducible), 424 
homogeneous, 391, 416 
periodic, 397, 419 
reducible (irreducible), 393 
process, 580 
homogeneous, 583 
property, 390 
strong, 418 
time, 75 


Subject Index 


Martingale, 457, 459 
Matrix 
doubly stochastic, 410 
stochastic, 391 
transition, 391 
Mean value, 65 
Measurable space, 14 
Measure, 629 
density of, 642 
extension, 19, 622 
theorem, 19, 622 
outer, 619 
signed, 629 
singular, 644 
space, 629 
Measure preserving transformation, 494 
Measure Space, 629 
Metric transitive 
sequence, 498 
transformation, 498 
Minkowski inequality, 88 
Mixed moment, 87 
Mixing transformation, 499 
Modification of process, 530 
Moment 
central, 87 
k-th order, 87 
mixed, 87 
Multinomial distribution, 47 
Multivariate normal (Gaussian) distribution, 
48, 173 


N 

Negatively correlated random variables, 87 
Non-lattice distribution, 160 

Normal distribution, 37 

Null state, 394 


O 

Oscillating random walk, 435 
Outer measure, 619 
Overshoot, 280 


P 
Parseval equality, 161 
Passage time, 336 
Path, 528 
Pathwise shift transformation, 496 
Periodic 
Markov chain, 397, 419 
state of Markov chain, 394 
Persistent state, 394 
Poisson 
distribution, 26, 39 


Subject Index 


Poisson (cont.) 
process, 297, 549 
theorem, 121 
Pollaczek—Spitzer identity, 345 
Polynomial 
Bernstein, 109 
exponential, 355, 366 
Positive atom, 420 
Positive state, 394, 411 
Positively correlated random variables, 87 
Posterior probability, 28 
Prediction, 616 
linear, 617 
Prior probability, 28 
Probability, 16 
conditional, 22, 95 
distribution, 17 
posterior, 28 
prior, 28 
properties of, 20 
space, 17 
sample, 528 
wide-sense, 17 
transition, 583 
Process 
branching, 180, 591 
Brownian motion, 549 
compound Poisson, 552 
continuous in mean, 536 
diffusion, 603 
distribution of, 528, 529 
Gaussian, 614 
homogeneous, 539 
Markov, 580 
modification of, 530 
Poisson, 297, 549 
random (stochastic), 527, 529 
regenerative, 600 
regular, 532 
renewal, 278 
homogeneous (stationary), 285 
semi-Markov, 593 
separable, 535 
stochastically continuous, 536, 584 
strict sense stationary, 614 
unpredictable, 611 
Wiener, 542 
with immigration, 591 
with independent increments, 539 
Prokhorov theorem, 651 
Proper random variable, 73 
Property, strong Markov, 418 
Pseudomoment, 210 


731 


Q 
Quantile, 43 
transform, 43 


R 
Radon-Nikodym derivative, 642, 644 
Radon-Nikodym theorem, 644 
Random 
element, 414, 649 
event, xiv 
process, 527, 529 
sequence, 527 
variable, 31 
almost invariant, 498 
complex-valued, 153 
defined on Markov chain, 437 
distribution of, 32 
improper, 32 
independent of the future, 75 
invariant, 498 
proper, 73 
standardised, 85 
subexponential, 376, 675 
symmetric, 157 
tail, 317 
variables 
independent, 153 
positively (negatively) correlated, 87 
vector, 44 
walk, 277, 278, 335 
oscillating, 435 
skip-free, 384 
symmetric, 400, 401 
with reflection, 434 
Range, Cramér, 256 
Rate function, 244 
Recurrent state, 394 
Reflection, 391, 434 
Regeneration time, 600 
Regenerative process, 600 
Regression line, 103 
Regular process, 532 
Regularly varying function, 266, 665 
Renewal 
equation, 716 
function, 279 
integral theorem, 280 
local theorem, 294 
process, 278 
Renovating 
event, 509 
sequence of events, 509 


Right closed martingale (semimartingale), 459 


Ring, 14 


732 


S 
Sample 
function, 528 
probability space, 528 
space, 414, 649 
Schwarz inequality, 88 
Semi-invariant, 242 
Semi-Markov process, 593 
Semimartingale, 458 
Separable process, 535 
Sequence 
asymptotically normal, 187 
Cauchy (in probability, a.s., in the mean), 
132 
ergodic, 498 
generated by transformation, 495 
lower, 318 
metric transitive, 498 
renovating, 509 
stationary, 493 
stochastic, 457 
stochastic recursive, 507 
tight, 148 
uniformly integrable, 135 
upper, 318 
weakly dependent, 499 
Series, Cramér, 248 
Set 
almost invariant, 497 
invariant, 497 
Shift coefficient, 604 
o-algebra, 14 
Signed measure, 629 
Singular 
distribution, 41, 325 
measure, 644 
Skip-free walk, 384 
Slowly varying function, 228, 665 
Space 
measurable, 14 
measure, 629 
of functions without discontinuities of the 
second kind, 529 
probability, 17 
sample, 414, 649 
sample probability, 528 
Spectral measure, 556 
Stable distribution, 233 
Standard deviation, 83 
Standardised random variable, 85 
State 
absorbing, 393 
ergodic, 411 
essential, 392 


Subject Index 


inessential, 392 

periodic, 394 

persistent, 394 

positive, 411 

recurrent, 394 

transient, 394 
State, null, 394 
State, positive, 394 
Stationary 

distribution, 404, 419 

of waiting time, 350 
process, 614 
sequence, 493 
of events, 509 

Stochastic 

matrix, 391 

process, 527, 529 

recursive sequence, 507 

sequence, 457 
Stochastically continuous process, 536, 584 
Stone—Shepp integro-local theorem, 216 
Stopping time, 75, 462 

improper, 466 
Strong law of large numbers, 108 
Strong Markov property, 418 
Subexponential 

distribution, 376, 675 

function, 376 

random variable, 376, 675 
Submartingale, 458, 459 
Sum, first nonnegative, 336 
Superexponential class of distributions, 373 
Supermartingale, 458, 459 
Symmetric 

random variable, 157 

random walk, 401 


T 
Tail 
event, 316 
of distribution, 228 
random variable, 317 
Tauberian theorem, 673 
Test function, 430 
Theorem 
Abelian, 673 
Arzela—Ascoli, 657 
basic coding, 455 
Berry—Esseen, 659 
Bochner—Khinchin, 158 
Carathéodory (measure extension), 19, 622 
central limit, 187 
central limit for renewal processes, 299 
continuity, 134, 167, 173 


Subject Index 733 


Theorem (cont.) Transformation 
de Moivre—Laplace, 115, 124 bidirectional preserving measure, 495 
dominated convergence, 139 ergodic, 498 
Gnedenko local limit, 221 metric transitive, 498 
Hahn’s on decomposition of a measure, 646 mixing, 499 
Helly, 655 pathwise shift, 496 
integral renewal, 280 preserving measure, 494 
integro-local, 216 Transient state, 394 
Karamata, 668 Transition 
Kolmogorov, on consistent distributions, density, 583 
56, 625 function, 582, 583 
Lebesgue, 644 matrix, 391 
local limit, 219 probability, 583 
local renewal, 294 Triangular array scheme, 121, 188 
measure extension, 19, 622 Two series theorem, 322 
Poisson, 121 
Prokhorov, 651 U 
Radon-Nikodym, 644 Undershoot, 290 
Stone—Shepp integro-local, 216 Uniform distribution, 18, 37, 325 
Tauberian, 673 Uniform integrability, 135 
two series, 322 right (left), 139 
Weierstrass, 109 Unpredictable process, 611 
Tight family of distributions, 651 Upper 
Tight sequence, 148 function, 546 
Time sequence, 318 
first passage, 278 
Markov, 75 Vv 
passage, 336 Variable, random, 31 
regeneration, 600 Variance, 83 
stopping, 75 Vector, random, 44 
waiting, 349 
Total probability formula, 25, 71, 98 WwW 
Total variation, 652 Waiting time, 349 
convergence in, 653 stationary distribution of, 350 
distance, 420 Wald identity, 469 
Trajectory, 528 fundamental, 471 
Transform Walk, random, 277, 278, 335 
Crameér, 473 Weak convergence, 141, 173, 649 
Laplace, 156, 241 Weakly dependent sequence, 499 
Legendre, 244 Weierstrass theorem, 109 


quantile, 43 Wiener process, 542 


